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ABSTRACT 


'''’he  interest  for  a  system  able  to  automatically  identify  the  modulation  type  of  an 
intercepted  radio  signal  is  increasingly  evident  for  military  and  civilian  purposes.  Although 
in  the  past  some  authors  looked  at  the  problem,  nobody  found  "the  solution"  and  the 
problem  remains.  Therefore,  an  overview  of  the  proposed  techniques  is  useful  in  order  to 
assess  the  actual  situation. 

This  document  presents  classification  techniques  and  featiires  (parameters 
characterizing  the  modulation  types)  used  for  modulation  recognition. 


RESUME 

Que  ce  soit  pour  des  fins  militaires  ou  civiles,  Pinteret  d’une  machine  capable 
d’identifier  automatiquement  le  type  de  modulation  d’un  signal  inconnu  est  evident.  Bien 
quo  par  le  passe  certains  auteurs  se  soient  penches  sur  le  probleme,  personne  n’a  obtenue 
d’eclatant  resultats  indiquant  la  route  a  suivre.  Par  consequent  il  n’y  a  pas  lieu  de 
conceutrer  ses  efforts  sur  un  seul  auteur  mais  sur  I’ensemble  des  techniques  proposees. 

Ce  document  couvre  done  le  sujet  d’une  fagon  globale,  presentant  les  techniques 
pour  classification  ainsi  que  les  parametres  utilises  pour  caracteriser  et  distinguer  les  divers 
types  de  modulation,  tel  que  proposes  par  les  auteurs. 


EXECUTIVE  SUMMARY 


The  motivation  of  this  document  is  the  interest  of  military  and  civilian 
organisations  in  monitoring  the  electromagnetic  signal  activity  in  the  RF  spectrum.  Since 
the  number  of  competent  trained  human  operators  is  constantly  decreasing  and  the  radio 
activity  in  the  HF  and  VHF  bands  increasing,  the  interest  of  a  machine  capable  of 
automatically  identifying  the  modulation  type  of  an  unknown  intercepted  signal  is  quite 
obvious.  Integrating  this  device  into  an  ESM  system  including  energy  detection  (spectral 
analysis)  Direction  Finding  and  Data  Fusion  and  Correlation,  would  allow  an  operator  to 
drastically  improve  his  efficiency  and  his  ability  to  monitor  the  activity  in  the  RF 
spectrum. 

Although  in  the  past  some  authors  looked  at  the  problem  and  proposed  algorithms 
permitting  to  achieve  proper  performance  for  high  SNR  signals,  for  an  ESM  perspective  low 
SNR  signals  are  more  likely  to  be  intercepted.  Therefore,  the  problem  of  finding  good 
parameters  able  to  discriminate  among  the  modulation  types  of  interest  when  the  noise  is 
important,  still  remains  and  is  very  realistic.  More  and  more  publications  appear  in  the 
literature  presenting  new  ideas  and  better  performance.  Also,  with  the  appearance  of 
better  hardware  processors,  new  possibilities  are  now  available  and  it  is  believed  that  soon 
a  modulation  recognition  device  will  be  able  to  classify  very  noisy  signals  in  a  short  period 
of  time  with  a  high  accuracy.  Although  such  a  device  is  not  yet  available,  the  work  done 
until  now  is  certainly  worthy  and  merits  consideration. 

This  document  introduces  and  presents  techniques  proposed  in  the  open  literature 
for  automatic  modulation  type  identification  of  an  intercepted  radio  signal.  Most  of  these 
techniques  are  based  on  the  same  classic  pattern  recognition  theory,  which  is  presented  in  a 
separate  section,  since  it  is  a  prerequisite  to  understand  the  following  sections.  The  two 
main  classification  techniques  used  for  modulation  recognition,  the  linear  classifier  and  the 
decision  tree,  are  more  specifically  discussed. 

Modulation  recognition  is  concerned  with  analog  as  well  as  digital  modulation  types. 
Although  there  is  a  modern  tendency  to  replace  analog  modulation  types  by  digital  ones, 
analog  modulations  (i.e.,  SSB,  FM,  AM,  etc.)  are  still  in  use  in  many  countries.  The 
motivation  to  intercept  these  signals  is  reinforced  by  the  fact  that  these  signals,  once 
identified,  can  also  be  demodulated  to  extract  the  message.  It  might  not  be  the  case  with 
digital  modulation  types,  since  usually  coding  is  involved:  error  correcting  code, 
encryption,  vocoder,  etc. 

Since  the  main  differences  between  the  approaches  proposed  by  the  referred  authors 
are  the  features  they  used,  the  main  part  of  the  document  consists  in  presenting  these 
features  and  the  corresponcfing  results.  Also,  more  particular  points  are  considered  such  as 
preprocessing  to  remove  gaps  in  analog  amplitude  modulated  signals. 

The  purpose  of  this  technical  note  is  to  summarize  in  a  very  comprehensive  way  all 
the  publications  on  the  topic  of  modulation  recognition. 
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1.0  INTRODUCTION 


From  the  earliest  days  of  radio  communication  the  need  to  monitor  the 
electromagnetic  signal  activity  in  the  RF  spectrum  has  existed.  Civilian  authorities  may 
wish  to  monitor  the  transmissions  over  their  territory  in  order  to  maintain  a  control  over 
this  activity.  Military  organizations  may  wish  to  monitor  the  radio  activiues  of  other 
powers  for  reasons,  among  others,  of  national  security. 

Typically,  the  technique  employed  by  monitoring  stations  throughout  the  world  is 
based  upon  a  well-tried  method  which  has  been  in  use  for  many  years.  This  is  a  one 
man-one  receiver  situation  in  which  the  operator  spends  his  time  searching  the  RF 
spectrum  with  a  continuously  tunable  general  purpose  receiver,  hoping  to  make  an 
interesting  interception.  There  are  variations  to  that  scenario.  For  example,  it  is  possible 
and  common  practice  for  an  operator  to  have  two  receivers.  One  is  used  for  searching, 
while  the  second  remains  tuned  to  a  known  interesting  frequency.  A  further  variation  is 
called  the  master-slave  technique.  In  this  method  a  group  of  operators  is  involved.  The 
master  operates  a  fast  tuning-sweep  receiver  with  an  associated  panoramic  display  unit. 
When  a  signal  of  interest  is  located,  this  intercepted  signal  is  transferred  to  one  of  the 
slaves,  who  tunes  to  the  appropriate  frequency  and  does  the  monitoring  with  a 
conventional,  general  purpose,  continuously  tunable  receiver. 

Neither  of  these  techniques  overcomes  the  fundamental  problem  of  serious 
overcrowding  in  the  radio  spectrum.  Moreover,  the  existing  pool  of  highlv  skilled  operators 
has  begun  to  dry  up  and  it  is  proving  difficult  to  find  replacements  [1].  The  classical 
method  of  monitoring  is  after  all,  a  very  boring  occupation.  Therefore  Electronic  Support 
.Measure  (ESM)  techniques  become  an  important  alternative. 

In  advanced  ESM  systems,  the  operator  is  helped  or  replaced  by  sophisticated 
electronic  machines.  These  machines  are  concerned  with  exploiting  enemy  electromagnetic 
emissions  for  the  purpose  of  gathering  intelligence  information  as  automatically  as  possible. 
This  information  is  provided  by  analysis  of  the  attributes  of  an  intercepted  signal.  Thus, 
Modulation  Recognition  (MR)  is  an  ESM  technique:  given  an  intercepted  signd,  it  aims  to 
identify  the  modulation  type  among  a  number  of  known  possible  modulation  types.  The 
terms  modulation  classification,  recognition  or  identification  are  currently  used  to  describe 
this  proce.ss. 

Prior  to  modulation  classification,  the  radio  signal  must  be  intercepted,  which 
means  that  somewhere  before  the  modulation  classification  system,  there  is  an  energy 
detection  system  looking  for  electromagnetic  emissions  in  the  bandwidth  of  interest.  Once 
a  signal  has  been  detected,  the  logical  following  step  is  to  try  to  identify  tliis  signal.  A  step 
farther  is  the  demodulation  of  that  signal  and  then  decryption  to  finally  obtain  the  signal 
itself. 


Thus  classification  is  neither  energy  detection  nor  norma)  signal  demodulation  with 
message  extraction;  it  is  something  in  between  (see  Figure  1). 


Information 
gain  after 
signal 
processing 


before  signal  processing 


Figure  1:  Informational  Relationships  [2,  p.312] 


For  energy  detection,  only  the  bandwidth  of  interest  for  the  ESM  system  is  known. 
On  the  other  hand,  for  demodulation  with  message  extraction,  knowledge  of  the  center 
frequency,  bandwidth,  type  of  modulation,  data  rate...parameters  is  required.  The  signal 
classifier  should  need  only  the  information  given  by  the  energy  detection  system,  i.e.  where 
the  signal  is:  center  frequency  and  bandwidth. 

An  example  of  an  ESM  system  is  illustrated  in  Figure  2.  The  intercepted  signal  is 
submitted  to  energy  detection  algorithms,  then  down-converted  and  bandpass  filtered, 
prior  to  modulation  recognition.  The  information  obtained  from  the  latter  and  from  the 
energy  detector  is  gathered  by  the  system  controller,  which  will  assign  the  signal  to  a 
proper  demodulator,  as  shown  in  Figure  2. 

The  particular  problems  re).ated  to  modulation  classification  are  caused  by  the  radio 
channel  and  the  ESM  system  components.  The  potential  problems  could  be  summarized 
by: 


Effect 

-Multipath  fading 
-Poor  SNRs 
-Amplitude  distortion 
-More  than  one  signal 
-Incomplete  signal 
—Frequency  instability 


Source 

Radio  Channel 
Radio  Channel 
Receiver  amplifier 
Energy  Detector 
Energy  Detector 
Down  Convertor 
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Figure  2:  Example  of  a  simple  ESM  system. 


The  multipath  fading  is  not  discussed  in  the  literature,  but  the  problem  of  poor 
SNR  is.  Although  classifiers  for  high  SNRs  exist,  those  for  low  SNRs  are  rare.  According 
to  some  simulations  [3][4][5][6],  under  ideal  circumstances,  it  uld  be  possible  to  classify  a 
signal  when  the  .SNR  is  above  8  dB  (results  for  a  real  sysicui  and  real  signals  could  be 
different).  All  the  classifiers  presented  in  the  literature  assume  only  one  signal  in  the 
passbana  of  the  narrowband  filter.  Thus  it  is  assumed  that  the  energy  detection  algorithm 
is  able  to  separate  every  signal  perfectly.  This  is  not  tiidal.  The  frequency  stability  is 
also  a  problem  raised  in  the  literature  [7).  The  performance  of  the  classifier  should  not  be 
too  affected  by  mistuning  the  down-converter.  Hipp  [8]  got  acceptable  7''rformance  with 
carriers  within  10%  of  the  typical  value. 

Furthermore,  modulation  recognition  presents  problems  which  are  due  to  the  nature 
of  the  signal  itself.  This  can  be  summarized  as  follows: 


Problem 


Source 


-insufficient  signal 
-unmodulated  segments 
-long  computing  time 


short  acquisition  time 

gaps  in  the  voice  (analog  modulations) 

too  many  features 


The  performance  of  a  classifier  is  closely  related  to  the  quantity  of  information 
available.  Thus  it  is  possible  to  improve  the  performance  of  a  classifier  with  longer 
acquisition  times  (more  sample  points)  and/or  by  computing  more  features  from  the  signal. 
In  both  cases  the  classification  process  time  is  increas^.  The  acquisition  time  is  especially 
long  for  analog  modulations.  AM,  SSB,  DSB  and  FM  are  afflicted  with  "unmodulated 
segments"  caused  by  gaps  in  the  voice.  These  gaps  result  in  CW  segments  for  AM  and 
FM,  and  in  noise  segments  for  SSB  and  DSB.  G^ant  [3]  presented  a  technique  to  remove 
these  unmodulated  segments  at  the  price  of  an  acquisition  time  of  at  least  1.5  sec.  For 
digital  modulation  types,  the  problem  is  to  get  enough  symbol  transitions,  otherwise  CVV 
will  be  detected.  The  problem  is  especially  obvious  at  low  data  rates.  For  example,  the 
data  rate  of  00 K  could  be  as  low  as  10  Biz,  then  five  seconds  are  required  to  get  only  50 
transitions.  The  effect  of  the  acquisition  time  on  the  system  performance  can  be  perceived 
in  [6],  [9]  and  [10]. 

The  computing  time  depends  mainly  upon  the  number  of  sample  points,  the  number 
of  extracted  features,  the  number  of  modulation  types,  and  the  complexity  of  the 
classification  algorithm.  These  features  permit  the  user  to  discriminate  among  the 
modulations.  More  features  could  provide  more  discriminating  facilities.  Also,  features  are 
usually  time  invariant,  which  means  they  are  computed  over  all  the  sample  points.  For 
these  reasons,  the  features  should  be  easy  to  compute,  so  that  a  reasonable  overall 
computing  time  can  be  obtained.  Also,  although  some  highly  sophisticated  classification 
algorithms  exist,  the  ones  used  in  pattern  recognition  are  simple.  The  overall  computing 
time  presented  in  the  literature  is  about  2  seconds. 

In  the  following  sections  the  process  of  modulation  recognition  itself  will  be  treated. 
Firstly,  the  concept  of  classification  or  recognition  will  be  presented.  Although  they 
describe  the  same  problem,  the  term  classification  and  recognition  do  not  represent  exactly 
the  same  reality.  Recognition  (from  pattern  recognition)  includes  an  additional  step  before 
classification:  feature  extraction  and/or  selection.  This  is  preparing  the  data  for 
classification.  Two  kinds  of  classifiers  are  presented  and  used  for  modulation  recognition; 
the  linear  classifier  and  the  decision  tree.  They  are  widely  used  in  real  systems  because 
they  are  simple  and  fast  enough  for  real-time  applications.  Finally,  the  researchers  in  the 
area  of  MR  will  be  summarized  in  a  matrix,  in  preparation  for  the  next  sections  which 
present  features  used  by  them  in  their  Modulation  Recognition  (MR)  systems. 

Section  3.0  presents  MR  systems  capable  of  recognizing  analog  modulation  types. 
The  features  useful  for  that  purpose  are  highly  sensitive  to  noise  for  an  obvious  reason; 
analog  modulation  schemes  consist  mainly  of  amplitude  modulation  (AM,  SSB,  DSB). 
However,  a  few  authors  presented  "less  sensitive"  features.  Another  topic  introduced  in 
this  section  is  the  problem  of  gaps  in  the  voice.  We  will  see  in  detail  the  interesting 
solution  proposed  by  Gallant. 


-1 


The  features  for  digital  modulation  schemes  (Section  4.0)  are  quite  different. 
Although  the  s^aration  between  Sections  3.0  and  4.0  is  very  arbitrary  (most  authors  do  a 
few  of  both),  different  features  are  required  to  be  able  to  get  information  on  the  phase  of 
phase  modulated  signals  (BPSK,  QPSK,  FSK).  These  features  are  given  in  Section  4.0. 

The  two  preceding  sections  present  features  used  by  pattern  recognition  algorithms 
for  the  purpose  of  modulation  recognition.  A  few  authors  proposed  an  ^ternative  way  of 
doing  MR,  by  using  energy  detection  algorithms.  This  perspective  is  presented  in  section 
5.0.  Unfortunately,  the  performance  expected  from  these  techniques  is  not  discussed  by  the 
authors. 

Finally,  Section  6.0  concludes  the  document  with  a  short  summary  and  comments. 
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2.0  CLASSIFICATION  TECHNIQUES 


2.1  GENERAL 

The  term  classification  is  the  action  of  associating  individuals  into  one  of  two  or 
more  alternative  classes  (groups)  on  the  basis  of  a  set  of  inputs  called  features  (variables). 
The  populations  are  known  to  be  distinct  according  to  the  features.  As  an  example, 
consider  an  archeologist  who  wishes  to  determine  wmch  of  two  possible  tribes  created  a 
particular  statue  found  in  a  dig.  The  archeologist  takes  measurements  for  several 
characteristics  of  the  statue  and  decides  which  tribe  these  measurements  are  most  likely  to 
have  come  &om.  The  measurements  of  the  statue  may  consist  of  a  single  observation  such 
as  its  height,  however,  we  would  then  expect  a  low  degree  of  accuracy.  If  on  the  other 
hand  the  classification  is  based  on  several  characteristics,  we  would  have  more  confidence 
in  the  prediction. 

Classification  algorithms  could  be  used  in  almost  any  area  of  knowledge,  it  is  the 
heart  of  any  decision  process.  Tou  [11]  divided  problems  where  classification  algorithms 
are  applied  into  two  major  categories: 

1.  The  study  of  human  beings  and  other  living  organisms, 

2.  The  development  of  theory  and  techniques  for  the  design  of  devices  capable  of 

performing  a  given  recognition  task  for  a  specific  application. 


The  first  subject  area  is  concerned  with  such  disciplines  as  sociology,  psychology, 
physiology  and  biomedical  sciences.  The  second  area  is  concerned  with  computer,  and 
engineering  aspects  of  the  design  of  automatic  pattern  recognition  systems.  Pattern 
recognition  can  be  defined  as  the  categorization  of  input  data  into  identifiable  classes  via 
the  extraction  of  significant  features  followed  by  a  classification  process.  Thus  in  pattern 
recognition  we  are  talking  of  a  two  step  process:  feature  extraction  and  classification. 
Contrary  to  the  preceding  examples,  in  pattern  recognition  there  are  no  direct  features 
available.  Although  data  is  available  under  a  digital  form  (signal  processing  ADC),  more 
processing  is  required  to  give  some  meaning  to  this  data.  During  the  feature  extraction 
process,  the  large  quantity  of  data  is  translated  into  a  few  significant  and  discriminant 
features  used  by  the  classifier. 

Pattern  recognition  spans  a  number  of  disciplines  and  problems,  as  shown  in  the 
following  list: 


-speech  recognition 
-speaker  recognition 
-speaker  verification 

-character  recognition 
-visual  inspection 
-ship  recognition 
-biomedical  analysis 
-weather  prediction 
-stock  market  prediction 


words  identification 
speaker  identification 
speaker  identification 
(knowing  the  words  used) 
character  identification 
object  anomaly  identification 
kind  of  ship  identification 
medical  diagnoses 
weather  forecast 

predicted  market  ups  and  downs 
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Early  pattern  recognition  researck  p^ormed  in  the  ’60s  and  ’70s  focused  on  the 
asymptotic  (infinite  training  data)  properties  of  classifiers.  Many  researchers  studied 
parametric  Bayesian  classifiers,  where  the  form  of  input  distributions  is  assumed  to  be 
known,  and  parameters  of  distributions  are  estimated  using  techniques  that  require 
simultaneous  access  to  all  training  data.  These  classifiers,  especially  those  that  assume 
Gaussian  distributions,  are  still  the  most  widely  used  because  they  are  simple  and 
described  in  a  number  of  textbooks. 

The  thrust  of  recent  research  has  changed,  much  of  it  motivated  by  the  desire  to 
understand  and  build  paralld  neural  net  dassiners  inspired  by  biological  neural  networks. 
This  has  led  to  an  emphasis  on  robust,  adaptive,  non-parametric  classifiers  that  can  be 
implemented  on  parallel  hardware.  It  is  very  likely  that  future  modulation  recognition 
systems  will  use  this  new  technology. 

In  this  chapter,  some  classical  classification  techniques  will  be  presented.  These 
techniques  are  used  in  pattern  recognition  as  well  as  in  human  sciences.  The  concept  of 
feature  extraction  and  sdection  will  also  be  introduced.  Then  the  particular  problem  of 
MR  itself  will  be  presented. 


2.2  PATTERN  RECOGNITION  TECHNIQUES 

The  goal  of  pattern  recognition  is  to  assign  input  patterns  to  one  of  k  classes.  The 
input  patterns  consist  of  static  input  vectors  x  containing  n  dements  (continuous  or 
discrete  values)  denoted  xi,  22,  xz,...y  2„.  These  elements  represent  measurements  of  features 
sdected  to  be  useful  for  distinguisWng  between  classes  and  insensitive  to  irrelevant 
variability  in  the  input.  A  good  classification  performance  requires  the  selection  of 
effective  features  as  well  as  the  selection  of  a  classifier  that  can  make  good  use  of  those 
features  with  limited  training  data,  memory,  and  computing  power.  During  the  training 
phase,  a  limited  amount  of  training  data  and  0  priori  knowledge  concerning  the  expected 
output  is  used  to  adjust  parameters  and/or  learn  the  structure  of  the  classifier.  Once  the 
training  is  accomplished,  the  dassifier  is  ready  for  the  test  phase,  during  which  a  new  set  of 
inputs  is  presented  to  the  dassifier  without  0  priori  knowledge  (see  Fipre  3).  The 
performance  is  then  computed  and  presented,  usually  in  a  confusion  table  (percentage  of 
good  classification  for  each  class). 

The  subject  of  feature  sdection  and  extraction  is  concerned  with  reducing  the 
dimensionality  of  pattern  representation.  Since  the  complexity  of  a  classifier  grows  rapidly 
with  the  number  of  dimensions  of  the  pattern  space,  it  is  important  to  base  decisions  only 
on  the  most  essential,  so-called  discriminatory  information.  Dimensionality  reduction  is 
also  recommended  from  a  classification  performance  point  of  view.  Initially  performance 
improves  as  new  features  are  added,  but  at  some  point,  inclusion  of  further  features  will 
result  in  performance  degradation. 

Dimensionality  reduction  can  be  achieved  in  two  different  ways.  One  approach  is  to 
identify  measurements  which  do  not  contribute  significantly  to  class  separability,  this  is 
feature  selection.  The  other  approach,  called  feature  e.xtraction,  consists  of  mapping  the 
useful  information  in  a  lower-dimension  feature  space  (see  Figure  4). 
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Figure  3:  The  two  major  phases  for  pattern  classifier  development!  12,  p.48) 


Dimensionality  reduction  by  feature  selection 


Dimensionality  reduction  by  feature  extraction. 


Figure  4 .  Dimensionality  reduction  by  feature  extraction/seloction. 
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To  solve  a  feature  selection  and/or  feature  extraction  problem,  we  need  some  sort  of 
evaluation  criterion.  Unfortunately  tms  is  not  a  trivial  problem.  The  quality  of  a  set  of 
features  is  very  closely  related  to  the  classifier  used.  Therefore  the  optimal  procedure  is  to 
try  all  the  possible  sets  of  features  with  the  classifier  and  to  retain  the  "best"  set.  In 
general  this  process  requires  too  much  computation  and  therefore  a  number  of  alternative 
feature  evaluation  criteria  exist. 

It  is  not  the  purpose  of  this  document  to  explain  the  details  of  feature  evaluation 
criteria,  however  the  basic  idea  will  be  given  in  Sections  2.2.1  and  2.2.2. 


2.2.1  Feature  Selection 

We  will  now  briefly  discuss  how  to  evaluate  a  feature  by  using  the  following 
example.  Suppose  that  an  individual  may  belong  to  one  of  two  populations.  We  begin  by 
considering  how  an  individual  can  be  classified  into  one  of  these  populations  on  the  basis  of 
a  measurement  of  one  characteristic,  say  X.  We  have  a  representative  sample  of  this 
measure  from  each  population.  The  distribution  is  represented  in  Figure  5. 

From  the  figure  above,  it  is  obvious  that  a  good  feature  must  have  a  small  variance 
among  the  samples  and  a  mean  highly  discriminative  among  classes.  Ideally  the 
distribution  should  not  overlap  so  that  there  is  no  misclassification.  If  the  two 
distributions  have  the  same  variance  and  same  prior  probabilities,  the  decision  rule  is  quite 
trivial,  and  the  threshold  is  at  the  intersection  of  the  distribution. 

Combining  more  variables  (or  features)  may  provide  better  classification  accuracy. 
Consider  two  variables,  Xi  and  X2,  with  distributions  similar  to  X.  By  combining  the  two 
variables  according  to  a  linear  function  (Z  =  aiXi-h  02X2),  we  get  a  two  dimensional  region 
(see  Figure  6).  Usually  Z  is  called  a  canonical  function. 

The  example  presented  in  Figure  6  shows  that  by  linearly  combining  some  features, 
a  decision  region  is  created  in  which  there  are  clusters.  The  samples  are  gathered  in 
clusters  according  to  similar  patterns.  In  the  classical  approach  discussed  here,  there  is  one 
cluster  per  class.  As  shown  in  Figure  6,  by  adding  a  feature,  the  clusters  should  be  more 
distinct,  with  less  overlapping  (in  the  dimension  N).  Thus  the  feature  selection  criterion 
should  evaluate  the  distance  between  clusters  and  verify  the  amount  of  overlapping.  There 
are  numerous  ways  to  evaluate  the  distance,  the  two  most  common  being  probably  the 
Mahalanobis  and  Euclidean  methods.  For  a  complete  list  of  distance  mt  'ement  criteria, 
see  [13j. 

Unfortunately,  even  with  distinct  and  non-overlapping  clusters,  the  classifier  may 
not  provide  the  performance  expected  according  to  a  feature  evaluation  criterion. 
Depending  upon  the  shape  of  the  clusters,  some  classifiers  may  not  be  able  to  adequately 
divide  the  decision  region  (this  will  be  explained  in  Section  2.2.3).  Therefore  the  real 
evaluation  criterion  is  done  by  a  stepwise  analysis,  trying  the  desired  classifier  with  the 
features,  adding  and  removing  them,  and  comparing  the  results. 
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POPULATION  II 


POPULATION  I 


Percent  of  members  of  Percent  of  members  of 

population  I  Incorrectly  population  II  incorrectly 
classified  into  population  II  classified  into  population  I 


Figuie  5:  Distribution  of  the  feature  for  the  two  classes. 


Figure  6:  Decision  Region:  2  features  and  2  classes. 
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2.2.2 


Featnie  Extraction 


As  pointed  out  in  Section  2.2,  in  feature  extraction  all  the  features  are  used  in  order 
to  create  lower-dimensional  space,  thus  reducing  the  classifier  complexity.  Information 
compression  is  achieved  by  a  mapping  process  in  which  all  the  useful  information  contained 
in  the  original  observation  vector  y  is  converted  onto  a  few  composite  features  of  vector  x, 
while  ignoring  redundant  and  irrelevant  information. 

x=  ^y) 


y=  [yu  VnV 

X=  [li,  XmV 


n>  m 

Although  non-linear  mapping  is  possible,  linear  mapping  is  much  more  usual  and 
has  the  advantage  of  being  computationally  feasible.  The  mapping  function  becomes  a 
matrix  multiplication. 


y=  Tx 

T=  [tu  fe,...,  q 
ti  are  column  vectors 
T^T=  I:  Tis  orthonormal 
xis  obtained  using 
av  =  y,  for  i=l,  2,...,  m 

This  kind  of  compression  technique  is  called  Principal  Component  Analysis  (PCA) 
in  the  statistical  literature  [14,  pp.309— 330],  and  the  Karhunen— Loeve  Expansion  in 
pattern  recognition  literature  [15,  pp.226— 250],  [16]. 

PCA  can  be  summarized  as  a  method  of  transforming  the  original  variables  into  new 
uncorrelated  variables  to  avoid  redundancy.  The  new  variables  are  called  the  principal 
components.  Each  principal  component  is  a  linear  combination  of  the  original  variables. 
The  principal  components  are  chosen  to  keep  the  mean-square  error  between  y  and  yy 
minimal,  where  yy,  the  estimation  for  y,  is  defined  by: 


m  n 


yy  =  S  +  S  biU 

1  m*  1 


where  b,  are  preselected  constant. 

The  matrix  T  is  computed  according  to  the  eigenvectors  of  the  covariance  (or 
autocorrelation)  matrix  of  the  distributions  of  the  y,  [15,  p.236]. 
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2.2.3 


Classification  Techniques 


The  concern  in  this  section  involves  the  determination  of  optimum  decision 
procedures,  which  are  needed  in  the  identification  process.  After  the  observed  data  from 
patterns  to  be  recognized  have  been  expressed,  a  machine  is  required  to  decide  to  which 
class  Wj  these  data  belong. 

By  and  large  the  only  generally  valid  statistical  decision  theory  is  based  upon  the 
average  cost  or  loss  in  misclassification,  formulated  in  terms  of  the  Bayes  expressions  for 
conditional  probabilities  (briefly  called  "Bayes  classifier").  In  standard  pattern  recognition 
theory  it  is  reasonably  accurate  to  assume  that  the  unit  misclassification  cost  is  the  same 
for  all  classes.  Assuming  that  x  is  the  vector  of  input  observations  (pattern  elements,  sets 
of  attributes  ...)  and  {w<,  t  =  1,  2,...,  k}  is  the  set  of  classes  to  which  x  may  belong,  let 
p(x|a;i)  be  the  probability  density  function  of  x  in  class  Wj,  and  P(w,^  be  the  a  priori 
probability  of  occurrence  of  samples  from  class  w,-;  in  other  words,  d,(x)  =  p(x|a;,)P(w,) 
corresponds  to  the  class  distribution  of  those  samples  of  x  which  belong  to  class  w,.  Here 
the  d,(x)  are  called  the  discriminant  functions.  The  average  rate  of  misclassification  is 
minimized  if  xis  conclusively  classified  according  to  the  following  rule: 

xis  assigned  to  oji  iff  d,(x)  >  dj(i),  V  j  ^  z 

The  main  problem,  of  course,  is  to  obtain  analytic  expressions  for  the  d,(2).  Notice 
that  even  a  large  number  of  samples  of  as  such,  does  not  define  any  analytical 
probability  density  function.  One  has  to  use  either  parametric  or  nonparametric  methods. 

Parametric  (also  called  probabilistic)  methods  assume  a  priori  probability 
distributions  (such  as  Gaussian)  for  input  features.  Parameters  of  distributions  (means, 
variances,  covariances,...)  are  estimated  using  supervised  training  where  all  data  is  assumed 
to  be  available  simultaneously.  These  classifiers  provide  optimal  performance  when  the 
underlying  distributions  are  accurate  models  of  the  test  data  and  sufficient  training  data  is 
available  to  estimate  the  distribution  parameters  accurately.  Although  these  two 
conditions  are  not  necessarily  satisfied  with  real-world  applications,  these  classifiers  are 
popular  since  they  are  simple  and  sufficiently  efficient  in  many  cases.  In  the  literature  on 
MR,  the  most  common  choice  is  Fisher’s  [17]  linear  classifier  presented  in  Section  2.2.3. 1. 

Although  nonparametric  techniques  exist  such  as  the  k-nearest  neighbor  classifier, 
they  are  not  popular  classifiers  for  MR,  being  too  complex  and  time  consuming.  They  also 
require  huge  amounts  of  training  data.  One  exeption  is  the  binary  tree  classifier  (also 
called  decision  tree,  classification  tree,...).  Since  it  is  used  for  MR,  it  will  be  presented  in 
Section  2.2.3. 2. 


2.2.3. 1  Linear  Classifiers 

It  is  assumed  that  a  pattern  vector  x=  [ii,  22,...,  6  z  <  it  (w,  are  the  possible 

classes)  is  presented  to  a  classifier.  As  shown  in  Figure  6,  it  is  possible  to  draw  a  straight 
line  between  them  and  call  this  line  the  decision  boundary,  threshold,  or  discriminant 
function  d(2). 
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d(x)=  lVo+  JV1X1+  Wiiii. 

d(x)  >  0  X  €  W; 
d(x)  <  0  -»  X  €  £Jj 


For  the  general  case  of  multiclasses,  there  are  as  many  discriminant  functions  as 
classes.  Then  the  decision  is  taken  according  to  the  rule 


xis  assigned  to  uii  iff  dj(x)  >  d/x),  V  j  ^  i 


Therefore  we  can  write 


d,(x)=  Wo+  W,xt+  W2X2 
dix)=-Wo-  W,x,-  W2X2 


Usually,  in  pattern  recognition,  the  matricial  notation  is  preferred.  A  new  vector,  z, 
is  introduced 


Z—  [1,  Xif  X2,...,  Xm]^ 

Wi=[Wo,  W^„...,  WmV 

W2=[-Wo,-W„...,-Wr„]T 

d,  =  Wi^z 
d2=  Wx 

d=mz 

d  =  [di,  d2,...,  dii;]^ 
W=[W„  W2,...,  W,] 


Figure  7  shows  a  few  possible  cluster  shapes.  It  is  clear  in  the  figure  that  a  linear 
classifier  cannot  efficiently  discriminate  some  clusters.  It  is  uncertain  that  a  linear 
classifier  will  be  able  to  determine  proper  linear  functions  when  there  are  several  clusters. 
However  other  kinds  of  classifiers  are  available,  such  as  the  quadratic  classifier. 
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Figure  7:  Cluster  shapes  possible  (two  classes)  [18,  p.35]. 

a)  Compact  and  well-separated  clusters. 

b)  Touching  clusters, 
cl  Concentric  clusters. 

dj  Linearly  nonseparable  clusters, 
e)  Multi-modal  clusters. 
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The  linear  classifier  was  expressed  as 

d=  W^z 
d=  WX+  e, 

c  being  the  thresholds,  i.e.  Wp. 

The  quadratic  classifier  is  expressed  as 

d  =  y^Vx+  Wx+  c 


The  boundaries  are  curves  instead  of  straight  lines.  Then  clusters  as  in  Figure  7-d 
can  be  properly  discriminated,  however,  the  computing  and  memory  requirements  are 
significantly  higher.  In  some  situations,  the  performance  improvement  obtained  with  the 
quadratic  classifier  is  so  small  that  it  does  not  justify  its  complexity. 


2.2.3.2  Dedsion  Tree  Classifiers 

Decision  tree  classifiers  are  hyperplane  classifiers  which  have  been  developed 
extensively  over  the  past  10  years.  It  is  a  rather  different  method  of  discriminant  analysis 
which  portrays  the  problem  in  terms  of  a  binary  tree.  The  tree  provides'  a 
hierarchical-type  representation  of  the  data  space  that  can  be  used  for  classification  by 
tracing  up  the  tree. 

The  line  of  development  started  in  1963  and  has  attracted  growing  interest  in  the 
last  10  years,  developing  a  large  number  of  algorithms  able  to  create  binary  trees  and 
multiple  binary  trees.  For  the  latter  see  [19].  A  classical  technique,  called  CART,  will  be 
presented  here. 

In  its  simplest  form,  the  CART  [20]  method  produces  a  tree  based  upon  individual 
variables.  For  example,  the  split  at  the  attorn  of  the  tree  might  be  determined  by  the 
question,  "Is  xs  <  6.2?".  This  will  determine  a  left  and  right  branch.  The  left  branch 
corresponding  to  15  <  6.2  might  then  be  divided  according  to  the  question,  "Is  13  >  1.4?" 
and  the  right  branch  for  which  15  >  6.2,  might  be  split  according  to  the  question,  "Is  i|  > 
0?".  The  methodology  has  three  components:  the  set  of  questions,  the  rules  for  selecting 
the  best  splits  and  the  criterion  for  choosing  the  extent  of  the  tree.  With  the  tree  trained, 
each  terminal  node  of  the  tree  is  associated  with  one  of  the  class  cj,. 

More  sophisticated  questions  can  also  be  handled,  such  as,  "Is  SW.i,  <  Threshold?". 
Numerous  questions  are  possible  and  can  be  mixed  in  the  same  tree.  Although  the  concept 
is  very  simple,  the  implementation  of  efficient  algorithms  able  to  optimize  the  tree  is  not; 
at  each  node  the  algorithm  must  be  able  to  select  the  best  question  and  the  best  feature, 
and  must  know  when  to  terminate  the  tree.  Note  also  that  it  is  a  nonparametric  procedure 
requiring  complex  and  efficient  training  algorithms,  as  well  as  a  considerable  amount  of 
training  data.  This  alternative  is  especially  interesting  when  a  lot  of  classes  are  involved 
and  a  lot  of  features  are  available. 
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2.3  MODULATION  RECOGNITION 


The  preceding  techniques  will  now  be  applied  to  solve  the  problem  of  Modulation 
Recognition  (MR).  As  introduced  in  the  first  chapter,  the  signal  presented  to  the  MR 
system  is  intercepted  by  a  receiver,  down-converted  and  bandpass  filtered  (see  Figure  8). 


MODULATION  RECOGNITION 


Figure  8:  General  Architecture  of  a  MR  system 


The  intercepted  signal  is  usually  derived  from  a  computer-controlled  intercept 
receiver.  For  example,  Gmlant  [3]  cites  the  Watkins-Johnson  WJ-8607  receiver  [21]  'I'lie 
RF  spectrum  is  scanned  to  find  a  signal.  When  a  signal  is  detected  we  have  a  rough  idea  of 
its  frequency.  Thus  the  signal  can  be  down-converted  to  an  intermediate  frequency  which 
IS  usually  fixed.  Thus,  the  output  of  the  receiver  is  an  IF  signal  ready  for  the  classifier 

The  preprocessing  stage  varies  a  lot  from  author  to  author,  and  can  even  be 
none.xistant.  An  example  of  preprocessing  is  shown  in  Gallant  [3|.  lie  uses  a  quite  complex 
three-step  preprocessing  scheme  to  remove  the  "unmodulated"  part  of  the  .\M,  FM,  DSH 
and  SSB  sigurils.  ThebC  "unmodulated"  parts  are  caused  by  gaps  in  the  voice,  mainly 
between  words. 

'I'he  preprocessing  stage  can  be  either  analog  or  digital.  Although  most  of  the 
authors  used  DSP  boards,  some  preferred  analog  preprocessing.  For  example,  Winkler  [22] 
sent  the  IF  signal  to  a  bank  of  parallel  analog  demodulators,  one  for  each  modulation  typo 
Calian  (previously  Miller)  [23]-[27|  used  three  analog  PLL  demodulators,  one  each  for  .\M, 
FM  and  DSB. 

'I'o  allow  some  versatility  and  flexibility,  feature  computation  is  accomplished 
digitally  Given  an  intercepted  signal,  a  feature  is  a  signal  characteristic  useful  ic 
ihscriimnate  among  the  possible  modulation  typos.  'riiesc  features  should  be  as 
discriminative  as  possible,  even  when  the  noise  level  is  significant.  .Most  of  the  classifiers 
do  not  perform  well  at  low  SNRs. 
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The  decision  procedures  considered  by  the  authors  are  represented  by  the  two 
techniques  explained  in  Section  2.2:  the  linear  and  the  decision  tree  classifiers.  The 
decision  tree  technique  considered  here  consists  of  the  simplest  form:  questions  of  the  type 
"Is  n  >  3.2?",  where  3.2  would  have  been  obtained  during  the  training  phase.  Moreover 
the  tree  is  not  optimized:  instead  of  using  a  sophisticated  algorithm  like  CART  to 
establish  the  optimal  splits  and  thresholds,  the  sets  of  rules  are  defined  empirically  by 
looking  at  the  training  data.  Note  finally  that  thr’  tree  is  sometimes  presented  in 
alternative  ways:  boolean  equations  or  logic  table  (see  Figure  9). 


a)  Decision  Region 


x1 

x2 

A 

>Th1?  1 

>Th2?  1 

B 

>Th1?  X 

>Th2?  0 

C 

>Th1?  0 

>Th2?  1 

c)  Logic  Table 


A  =  (x1>Th1)(x2>Th2) 
B  =  (x2<Th2) 

C  =  (x1<Th1)(x2>Th2) 

b)  Boolean  Equations 


B  A 


Figure  9:  Illustration  of  the  logic  tree  decision  process. 


Usually  the  authors,  especially  the  latest  ones,  will  use  a  linear  classifier.  However 
one  author,  Jondral  [28)[29],  got  slightly  better  results  with  a  quadratic  classifier. 

To  collect  the  information  on  MR,  a  literature  search  has  been  conducted.  .411  the 
papers  reported  here  are  unclassified.  They  report  an  interest  in  recognizing  both  analog 
and  digital  modulations:  AM,  SSB,  DSB,  FM,  ASK.OOK,  BPSK,  QPSK  and  FSK.  From 
a  military  perspective,  Torrieri  [30]  in  his  book  says  that  the  importance  of  analog 
communication  systems  is  declining,  probably  due  to  the  proliferation  of  digital  computers 
and  the  security  provided  by  cryptographic  digital  communication.  Thus  in  the  last  years 
more  papers  have  been  published  on  digital  modulation  classifiers.  .Nevertheless,  analog 
communications  are  still  in  use  and  the  problem  of  analog  modulation  classification  still 
retains  some  interest. 


17 


The  techniques  for  modulation  classification  described  in  the  following  pages  have 
been  grouped  depending  on  whether  the  author  put  more  emphasis  on  analog  or  digital 
modulation  schemes.  Within  each  group,  the  authors  have  been  gathered  according  to 
some  Similarities,  as  shown  in  Table  1.  In  the  table,  "?"  refers  to  a  modulation  type  for 
which  it  was  not  clear  whether  or  not  the  MR  algorithm  was  able  to  recognize. 
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GROUP 

AUTHOR 

Miller 

Luiz 

Wakenan 

Fry 

Gadbois 

Ribble 

UTL 

Gallant 

ANALOG 

Weaver 

Winkler 

Callaghan 

Fabrizi 

Petrovic 

Aisbett 

Einicke 

Hipp 

MODULATION  TYPE 


SEC¬ 

TION 


ANALOG  DIGITAL 


S  D  F  A  F  B  Q 

S  S  M  S  S  P  P  OTHER 

B  B  K  K  S  S 

K  K 


Liedtke 

DIGITAL 

Jondral 

Dominguez 

Adams 

Mammone 

DeSimio 

Ready 

ENERGY 

DETEC- 

Kim 

TOR 

Gardner 

/ 

/ 

/ 

/ 

/ 

/ 

/ 

/ 

/ 

/ 

/ 

/ 

n 


4-FSK 
8-PSK 
4-FSK 
4 -ASK 
4-FSK 


/ 

/ 

/ 

/ 

BB 

/ 

/ 
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3.0  MR  FOR  ANALOG  MODULATIONS 


This  group  is  bigger  for  a  historical  reason:  the  first  paper  was  written  in  1969. 
The  interest  in  (figital  modulation  is  more  recent.  The  particularity  of  this  group  is  the 
features  used.  Because  analog  modulations  are  dominated  by  amplitude  modulation  (.A.M, 
DSB,  SSB),  the  signal  envelope  is  a  characteristic  somewhat  exploited  by  several  authors, 
starting  with  Gadbois  in  1985.  Before  that,  authors  were  using  a  hardware  approach  as  we 
will  see  with  Miller. 


3.1  MILLER  APPROACH 

3.1.1  Miller 

In  1978,  Miller  Communications  Systems  Limited  (now  Calian  Communications 
Systems  Limited)  presented  a  report  [23]  to  DREO  on  the  feasibility  of  a  HF/VHF 
spectrum  surveillance  receiver,  including  automatic  modulation  type  identification.  DREO 
liked  the  idea  and  prepared  a  contract  for  a  prototype  ESM  receiver.  The  approach 
developed  by  Miller  for  modulation  classification  prevailed  for  years  and  led  to  a  number  of 
publications  [23]-(27][31][32|[7]. 

In  their  implementation,  the  receiver  produces  a  455kHz  IF  signal  (with  a  passband 
filter  of  bandwidth  either  6  or  24  kHz)  which  is  fed  to  an  envelope  sampler  .\PD 
(Amplitude  Probability  Distribution)  and  three  parallel  phase-lock-loop  (PLL) 
demodulator  circuits  (see  Figure  10). 

The  method  used  to  characterize  the  signal  amplitude  envelope  is  called  .APD 
(Amplitude  Probability  Distribution).  As  shown  in  Figure  10,  the  output  of  the  envelope 
detector  is  sampled  by  an  8  bit  ADC  (with  a  sampling  rate  of  1.7kHz).  N  points  were  used 
to  compute  the  follovring  statistics: 


Mean:  /r  =  1  S  (x) 

N 

Variance:  s^  =  1  (S  (x^)  -  1  (£x)2] 

N  N 

P  (x):  Probability  that  the  ADC  output  is  equal  to  x, 
corresponding  to  the  APD  (see  Figure  11). 


After  an  inspection  of  2>3  samples,  Miller  found  that  good  statistical  significance  was 
obtained  and  could  be  useful  for  modulation  classification.  However,  to  keep  the  number  '^f 
features  low,  only  two  APD  parameters  have  been  retained:  the  variance  s^  and  the 
probability  that  the  output  of  the  .^DC  equals  0,  P  (0). 
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In  a  similar  way,  Miller  studied  the  output  of  the  PLL  circuits  in  order  to  use  the 
property  that  different  types  of  PLL  circuits  have  a  tendency  to  lock  onto  different  signals. 
Each  of  the  three  loops  (AM,  FM  and  DSB)  has  2  outputs  used  for  the  classification:  a 
Lock  indicator  and  a  Modulation  indicator.  Considering  4096  samples,  Miller  created  a 
table  which  is  reproduced  in  Table  2.  This  table  shows  which  PLL  is  locked  to  which 
modulation  t3rpe. 

Combining  both  APD  and  PLL  results.  Miller  created  a  Reference  Logic  Table  (see 
Table  The  table  contains  eight  binary  features:  two  ffom  the  APD  and  six  from  the 
PLL.  The  binary  values  are  the  results  of  hard  decisions  on  the  outputs  of  the  ADCs  with 
reference  thresholds. 


SIGNAL 

AM-M 

AM-L 

FM-M 

FM-L 

DSB-M 

DSB-L 

S^ 

P(0) 

AM 

1 

1 

0 

1 

1 

1 

1 

0 

FM 

X 

0 

1 

1 

X 

0 

0 

0 

PSK 

1 

0 

1 

1 

1 

1 

0 

0 

CW 

0 

1 

0 

1 

0 

1 

0 

0 

DSB 

X 

0 

1 

0 

1 

1 

1 

1  “ 

SSB 

X 

0 

X 

1 

X 

0 

1 

1 

Threshold 

.312 

.812 

.625 

.500 

.250 

.812 

256 

.004 

Table  2:  Miller’s  Reference  Logic  Table  [25j. 


The  classification  is  accomplished  by  comparing  the  unknown  with  the  referenre 
table  for  a  perfect  fit.  To  use  the  terminology  introduced  in  Section  2,  it  is  a  very  simple 
binary  tree  classifier.  The  bandwidth  of  the  passband  filter  is  first  set  at  24  kHz.  If  there 
is  no  match  for  the  classification,  another  attempt  is  made  with  the  6  kHz  filter. 

The  results  obtained  by  Miller  are  quite  impressive.  They  claim  a  percentage  of 
correct  identification  higher  than  90%  for  ail  modulation  types.  However,  the  SNR  is  not 
specified.  To  get  these  results,  the  classifier  used  256  samples  per  try  and  a  minimum  of 
four  tries  per  classification. 

As  stated  previously,  the  Miller  prototype  ESM  receiver  was  built  under  a  DND 
contract.  An  evaluation  of  Miller’s  work  is  reported  by  Luiz  in  [31]. 
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3.1.2  Wakeman 

In  1985,  a  U.S.  Patent  was  registered  [32]  for  a  "Spectrum  Surveillance  Receiver 
System".  It  is  Miller’s  receiver,  no  upgrade  to  the  classification  procedure  has  been 
reported. 


3.1.3  Fry 

Fry  [7]  also  looked  at  Miller’s  receiver.  Some  problems  had  been  reported  i,  and  Fry 
was  assigned  by  DREO  to  determine  their  origins.  After  some  tests,  he  found  that  due  to 
the  nature  of  the  PLL  design  used,  the  accuracy  of  the  identification  process  deteriorated 
rapidly  if  the  receiver  was  not  tuned  to  the  c»*".ter  of  the  signal.  He  also  noted  that 
identification  failed  with  real  voice  signals.  Tests  with  the  modified  prototype 
(modification  both  in  the  hardware  and  decision  tree)  revealed  95%  correct  identification 
with  a  lOdB  SNR,  when  a  sine  wave  was  used  as  modulating  signal.  Unfortunately,  the 
accuracy  fell  drastically  for  real  voice  signal.  With  an  infinite  SNR  (no  noise)  and  a  real 
voice  signal  (normal  gaps  associated  with  continuous  speech),  he  got  an  average 
performance  around  60%. 


3.2  GADBOIS  APPROACH 
3.2.1  Gadbois 

In  1985,  Gadbois  [33]-[35]  introduced  a  novel  alternative  to  modulation 
identification,  based  on  the  envelope  characteristics  of  the  received  signal.  The  feature 
selected  is  the  ratio  R  of  the  variance  to  the  square  of  the  mean  of  the  squared  envelope. 
Intuitively,  since  FM  has  constant  envelope  while  AM  does  not,  the  former’s  R  is  zero  and 
the  latter’s  is  close  to  unity.  After  some  studies  Gadbois  found  that  "...SSB,  DSB,  AM  and 
FM  have  very  distinctive  R’s...".  He  developed  mathematical  relations  relating  R  to  the 
SNR,  as  presented  in  Figure  12.  These  expressions  are  based  on  the  assumption  of  having 

infinite  record  lengths.  In  the  case  of  samples,  the  estimation  of  R,  noted  R,  is  computed 
as  followed. 


a2  «  ^2  =  _J_  S  (jZ)  _  1  (Si)2 

N  N2 

*  1 

fin  fi  = _ Si 

N 

R«R=  ‘^^=N 


1.  "Laboratory  use  of  the  receiver  revealed  three  major  shortcomings;  speed,  accuracy  and 
friendliness.",  and  talking  about  modulation  classification,  "For  some  reason,  the  unit  did 
not  work  well  and  its  performance  Quctuated  from  day  to  day.". 
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Figure  12:  Mathematical  relations  relating  R  to  the  SNR  [35,  p.l52]. 


Computing  R  for  N=2048  points  per  sample  for  a  large  number  of  samples,  Gadbois 
determined  threshold  values  within  a  small  decision  tree  for  classification  of  AM,  SSB, 
DSB  and  FM.  In  his  experiments,  the  voice  was  simulated  by  Gaussian  white  noise 
low-pass  filtered.  The  sampling  rate  for  the  12-bit  ADC  was  160kHz.  The  IF  frequency 
was  40kHz.  The  results  of  that  very  simple  classifier  were  computed  from  200  samples  per 
modulation  type  and  are  presented  here  in  Table  3. 


ACTUAL 

CLASSIFIED  AS 

FM 

AM 

SSB 

DSB 

FM 

200 

0 

0 

0 

AM 

0 

181 

19 

0 

SSB 

0 

15 

160 

25 

DSB 

0 

0 

12 

188 

Table  3:  Gadbois’  confusion  table  [33,  p.22.5.4]. 


In  order  to  discriminate  among  constant  amplitude  modulation  type,  Gadbois 
proposed  three  additional  features:  R2,  R3  and  R4.  The  main  difference  between  FM,  FSK 
and  PSK  resides  in  the  phase  characteristics.  To  exploit  and  extract  relevant 
characteristics,  a  Digital  Phase  Lock  Loop  (DPLL)  is  used.  The  output  of  this  PLL,  being 
proportional  to  the  phase  derivative,  can  be  roughly  summarized  as  follows: 
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—for  FM,  the  output  is  a  Gaussian  random  process 
-for  FSK,  the  output  is  a  noisy  square  wave 
-for  PSK,  the  output  is  uniformly  spaced  impulses 


'7 


R?. 


a 


'7 


1 


G 


'7 


u 


Figure  13:  Gadbois’  system  block  diagram  [34,  p.39] 


The  discriminative  characteristics  in  the  waveforms  can  be  increased  by  further 
processing,  as  shown  in  Figure  13.  The  output  of  the  rectifier  (DC  block  rectifier  used  to 
remove  the  carrier)  is  not  significantly  modified,  except  for  an  FSK  signal  which  generates 
an  almost  pure  DC  signal.  Thus,  R2,  being  a  measure  of  the  signal  variability,  is  potentially 
able  to  di&rentiate  the  three  modulation  types.  However,  simulations  showed  that  R2 
alone  could  hardly  separate  FM  from  PSK  at  SNR  less  than  20  dB.  By  adding  a  low-pass 
filter,  Gadbois  created  an  additional  feature,  R3,  which  would  be  similar  to  R2  except  for 
PSK  signals  (impulses  corresponding  to  PSK  signals  contain  high  frequency  harmonics) 
Therefore,  the  ratio  R2/R3,  denoted  R4,  should  provide  the  separation  between  FM  and 
PSK: 


-  for  FM,  R3  would  be  less  than  R2  because  the  noise  would  be  reduced  by  the  filter 

-  for  PSK,  R3  would  be  much  less  than  R2  since  both  noise  and  some  of  the 
baseband  signal  would  be  filtered. 

The  accuracy  obtained  with  these  features  is  98%,  with  a  SNR  of  10  dB,  assuming 
that  only  these  three  modulation  types  are  present. 
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3.2.2 


Kibble 


Using  methods  similar  to  Gadbois,  Kibble  [9]  developed  his  system  around  the  same 
features  Rj  and  R2,  but  replaced  R3  and  R4  by  ACrOW  (see  Figure  14). 


Figure  14:  Kibble’s  system  block  diagram  (9,  p.20]. 


In  the  block  diagram,  Ri  is  Gadbois’  ratio  R,  and  R2  is  almost  like  Gadbois’.  The 
last  feature  is  a  power  parameter,  ACPOW.  It  measures  the  power  at  the  DPLL  output'. 
Thus  it  gives  the  value  of  the  amount  of  phase/frequency  changes  in  the  signal.  For  digital 
modulation  schemes,  the  value  of  ACPOW  decreases  with  the  baud  rate. 


Once  again  the  decision  process  uses  a  logic  tree.  The  value  of  R  is  used  only  to 
separate  Group  B  (phase  modulations)  from  Group  A  (amplitude  modulations)  The  other 
features  will  complete  the  estimation  of  the  modulation  type.  The  time  required  for  the 


'.  It  IS  given  that  ACPOW  =  ACOUT/ACIN.  Here  ACIN  is  a  normalisation  factor. 
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computation  and  the  decision  is  1.7sec.  The  peifonnance  obtained  is  about  the  same  as 
Gadbois’,  but  Ribble  reduced  the  sampling  time.  He  pushed  the  experiments  further  and 
replaced  the  simulated  voice  by  real  voice,  obtaining  drastically  lower  accuracies.  However 
Fdbble  did  not  go  further  with  the  problem  and  conduded  his  report  with  this  comment: 

"If  the  prime  modulating  signal  of  concern  is  voice  then  a  different  data  acquisition 
process  will  be  necessary.  It  would  have  to  reject  segments  of  low/no  modulation 
and  stack  together  those  segments  which  indicate  modulation  is  present.  This 
would  obviously  result  in  extremdy  long  time  frames  to  analyze  voice,  but  this 
seems  to  be  the  only  way  to  cope  with  this  situation."  Ribble  [9,  p.72]. 


3.2.3  UTL 

In  1989,  DND/DEEM  4  gave  a  contract  to  UTL  CANADA  INC.  [36]  for  an 
analytical  examination  of  various  classification  techniques  fJondral,  Callaghan,  Gadbois, 
Ribble,  Fabrizi,  Wakeman  and  Gardner).  The  contract  also  asked  for  the  design  of  a 
prototype  MRU  (Modulation  Recognition  Unit). 

UTL  dedded  that  nibble’s  approach  was  the  most  promising  and  tried  to  improve 
the  performance  further.  The  block  diagram  was  roughly  the  same.  However  the 
algorithm  for  evaluation  of  the  bandwidth  was  "improved".  The  logic  tree  for  the  decision 
process  was  also  changed,  incorporating  a  new  feature,  the  product  of  the  preceding 
features  Rj  and  Ra.  The  hardware  was  set  with  the  following  conditions:  IF  frequency  of 
34kHz  with  12kHz  bandwidth  and  a  sampling  frequency  of  140kHz.  The  performance  was 
not  significantly  improved  and  the  results  for  real  voice  were  still  unsatisfying  (average 
probability  of  success  around  25%). 


3.2.4  Gallant 

Neither  Ribble  nor  UTL  were  able  to  obtain  good  performance  with  Gadbois’  ratio. 
Gallant  [3]  looked  at  the  problem  again  and  got  very  interesting  results. 

First  of  all  he  considered  the  problem  of  real  voice  simals.  The  preceding  authors 
got  good  results  with  simulated  voice,  but  their  performance  mil  drastically  with  real  voice. 
They  all  agreed  that  the  gaps  in  voice  signals  create  "unmodulated"  segments  which  are 
improper  to  classify  the  signed. 

"1.  Unmodulated  AM  and  FM  produce  a  pure  carrier  signal;  hence  they  are 
indistinguishable  from  each  other  and  CW;  and 

2.  Unmodulated  SSB  and  DSB  produce  no  transmitted  signal;  hence  they  are 
indistinguishable  from  each  other  and  noise"  Gallant  [3,  p.24] 

In  order  to  solve  this  problem.  Gallant  added  a  preprocessor  to  Gadbois’  algorithm. 

before  computing  the  ratio  R,  the  preprocessor  removes  the  unmodulated  segments.  This 
preprocessor  is  comprised  of  three  steps  which  will  now  be  described. 
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The  first  two  steps  permit  the  MRD  to  reject  long  stretches  of  unmodulated 
waveform  on  a  se^ent  to  segment  basis^.  This  proc^ure  is  performed  by  the  Front-End 
which  is  divided  in  two  parts:  the  first  part  rejects  low-energy  segments,  the  second 
rejects  noise-like  segments. 

Looking  at  the  behavior  of  R.  Gallant  found  that  "anomalies"  occur  when  the 
modulated  signal  envelope  is  small.  The  third  step  consists  of  rejecting  low-valued 
envelope  sampling  points,  on  a  point  to  point  basis.  ViHien  N  good  points  are  collected,  a 
"pseudo  fully-modulated"  segment  of  2048  points  is  formed. 


The  improvement  obtained  by  this  preprocessing  prior  to  the  computation  of  R  is 
quite  significant.  Figures  15  and  16  show  the  system  block  diagram  and  the  results  with 
and  without  the  preprocessing. 

As  we  can  see  in  Figure  16,  there  is  no  real  difference  between  AM  and  FM, 
especially  for  low  modulation  index  AM.  Therefore  Gallant  proposed  a  new  feature, 
denoted  VAR,  able  to  discriminate  very  low  modulation  index  AM^.  This  new  feature  is 
the  variance  of  the  segment  to  segment  envelope  variance. 


CHANNEL 

NOISE 


Figure  15:  Experiment  block  diagram  (3,  p.47j. 


*  A  segment  is  formed  by  N=2048  points  and  takes  64ms  at  a  sampling  rate  of  32kHz  The 
complete  classification  process  requires  Lj  segments. 

2.  "...some  military  radios  are  known  to  have  modulation  indices  in  the  range  45%  to 
65%..."  Gallant  [3,  p.49). 
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NEW  ALGORITHM 


C/N  (dB) 


Figure  16:  Variance  of  R  diminished  by  preprocessing. 

a|  Gadbois’  ratio  R  (Gallant’s  experimental  results)  [3,  p.23] 
b)  New  feature  proposed  by  Gallant  (3,  p.48] 
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Figure  17:  The  new  feature  VAR  proposed  by  Gallant  [3,  p.54]. 


The  decision  process  is  also  a  logic  tree,  however  much  simpler  than  Kibble’s 
because  there  are  only  two  features.  The  performance  obtained  with  real  voice  and  SNRs 
above  8dB  is  quite  good  (90%).  However  the  acquisition  time  is  long  :  at  least  1.5  seconds. 


3.3  OTHERS 


3.3.1  Weaver 

Probably  the  first  author  to  publish  about  modulation  type  classification  is 
Weaver  [37]  in  1969.  He  proposed  the  use  of  pattern-recognition  techniques  to 
automatically  identify  the  type  of  modulation  on  HF  radio  signals.  The  intercepted  signal 
was  passed  through  an  8kHz  analog  bandpass  filter  and  down-converted  to  baseband.  It 
was  then  digitized  with  an  8-bit  ADC  at  a  16kHz  sampling  rate.  The  resulting  digital 
signal  fed  a  bank  of  29  parallel  narrowband  filters  and  envelope  detectors.  The  outputs 
were  averaged  over  an  observation  time  of  one  second.  These  mean  values  created  a 
29~-dimension  feature  vector  used  for  the  decision  process.  The  classifier  is  an  analog 
implementation  of  a  linear  classifier.  Weaver  claims  95%  classification  accuracy  at 
"typical"  SN^  for  separating  AM  and  SSB.  He  warns  that  speech  breaks  give  rise  to 
unmodulated  signals  and  a  re^  system  may  require  several  seconds  of  observation  time. 


3.3.2  Winkler 

Winkler  [22]  i  proposed  a  technique  based  on  characterizing  the  amplitude  and 
phase  spectrum.  He  used  a  pure  sinusoid  as  the  modulating  signal.  He  obtained  100% 
accuracy,  although  he  admitted  that  gross  classification  errors  occur  when  the  baseband 
signal  has  a  noisy  spectrum. 


3.3.3  Callaghan 

Callaghan  [38]  also  based  his  classifier  on  the  classical  pattern-recognition  theory, 
he  used  a  linear  classifier.  He  sampled  the  signal  envelope  and  amplitude  zero-crossing  (to 
get  the  instantaneous  frequency).  Then  he  computed  the  means  and  the  standard 
deviations.  The  sampling  frequency  was  lOOHz  and  the  observation  time,  2  seconds.  He 
reported  only  the  performance  for  separating  AM  to  FM:  99%  for  an  SNR  above  20dB. 
Callaghan  concluded  his  paper  admitting  that  his  features  were  very  sensitive  to  noise. 


3.3.4  Fabrizi 

Instead  of  considering  the  variability  of  the  envelope,  Fabrizi  [39]  suggested  a  new 
feature,  the  envelope  peak  to  mean  ratio  noted  P*.  He  also  used  the  mean  M  of  the 
instantaneous  frequency. 


'  The  information  comes  from  Gallant[gallant,  p.9-10],  the  reference  is  not  available  at  this 
time. 
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S/N{dB)  S/N  (dB) 


Figure  18:  Fabrizi’s  features  (39,  p.l38]. 


With  only  these  two  features  Fabrizi  built  a  logic  tree  for  the  decision  process 
(see  Figure  19).  The  parameters  were  collected  over  a  250ms  sampling  time,  using  32knz 
sampling  b;equency  and  3kHz  filtered  real  voice.  Simulations  showed  that  separation  of 
AM  from  FM  using  the  instantaneous  frequency  parameter  could  not  be  achieved  at  SNRs 
below  35dB.  However,  SSB  could  be  separated  from  AM/FM  group  at  SNR  above  5dB. 
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Figure  20:  Petrovic’s  classifier  [40,  p.387]. 


The  signal  activity  feature  of  Petrovic’s  classifier  enables  it  to  determine  whether  or 
not  a  signal  is  present.  The  second  activity  factor  represents  the  percentage  of  time  the 
envelope  exceeds  a  predetermined  threshold.  The  amplitude  variation  feature  is  the  mean 
value  of  the  full-wave  rectified  AC  signal  envelope.  The  FM  demodulator  gives  the 
instantaneous  frequency  of  the  signal.  The  signal  is  applied  through  a  low-pass  filter 
(3kHz)  and  the  output  is  averaged  to  detect  narrowband  (analog)  FM.  The  signal  is  also 
applied  to  a  3kHz  high-pass  filtered  to  detect  wideband  (digital)  FM. 


The  decision  process  is  accomplished  with  a  reference  logic  table  (variation  of  a 
decision  tree).  The  IF  frequency  is  not  given  but  the  author  said  that  the  IF  bandwidth  is 
optimal  at  15kHz.  The  Confusion  Matrix  is  not  given. 
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FREQUENCY 


3.4  AISBETT  APPROACH 


A  lot  of  work  has  been  done  at  the  Electronic  Research  Laboratory,  Department  of 
Defence,  Defence  Research  Centre  Salisbury,  Adelaide,  South  Australia  regarding 
modulation  recognition.  New  features  ,  which  are  claimed  to  be  "noise  resistant"  are 
presented  by  two  authors,  Aisbett  and  Einicke  in  this  section. 


3.4.1  Aisbett 

Almost  every  paper  presented  proposed  features  which  are  very  sensitive  to  noise. 
For  example,  remember  that  Fabrizi  was  not  able  to  separate  AM  from  FM  at  SNRs  below 
35  dB.  Aisbett  [41][42]  closely  considered  the  problem  of  additive  white  Gaussian  noise 
(AWGN)  regarding  time-invariant  features  for  pattern  recognition. 

She  observed  that  most  published  modulation  recognition  schemes  perform  poorly 
because  the  authors  chose  signal  parameters  which  can  only  be  estimated  with  a  bias  in  the 
presence  of  AWGN.  She  shows,  for  example,  how  the  sample  mean  and  sample  variance  of 
the  instantaneous  frequency  varies  with  SNR  for  a  number  of  analog  modulation  types 
Thus  she  proposed  time-domain  signal  parameters  which  are  unbiased  estimators  of  the 
true  signal  parameters  in  the  presence  of  AWGN  with  symmetric  spectral  density. 

The  three  new  proposed  noise  resistant  features  are  A^,  AA’  and  k'^0',  where  k  is 
the  signal  envelope.  A’  the  signal  envelope  derivative  and  the  instantaneous  frequency 
Considering  very  low  SNRs  (3dB),  she  claims  discrimination  between  AM,  FM  DSB  and 
CW  possible  on  the  basis  of  characterizing  the  new  parameters’  statistical  distribution 
functions. 


Figure  21;  Distributions  of  Aisbett’s  parameter 

and  the  typical  parameter  .A  (SNR  =  IdB)  [41,  p.l2,  17). 
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It  is  shown  in  Figure  21  that  for  very  low  SNR  (—1  to  3dB),  the  typical  parameter  A 
(signal  envelope)  has  a  distribution  function  which  offers  no  discrimination  between  the 
modulation  types.  However  the  distribution  functions  for  Aisbett’s  parameter  are 
distinct  and  could  therefore  be  used  for  classification. 

She  was  satisfied  by  her  preliminary  work  and  envisaged  further  simulations  with  a 
much  larger  data  base  in  order  to  implement  a  classifier  using  a  pattern— recognition 
algorithm  applied  to  these  new  features. 


3.4.2  Einicke 

A  few  years  later,  a  paper  written  by  Einicke  [6]  was  published.  He  described  a 
classifier  based  on  Aisbett’s  features. 


A2  =  12  +  Q2 

AA’  =  II’  +  QQ’ 
A2^  =  IQ’  -  I’Q 


He  added  two  classical  parameters,  the  signal  envelope  and  instantaneous  frequency 

A  =  f  A2^*^^ 

F  =  <?’  =  (A2V)/A2 


As  stated  by  Aisbett,  the  statistical  distribution  of  these  parameters  should  have  a 
good  discriminating  power  even  for  low  SNRs.  Jondral’s  method  ^  of  using  the  histogram 
as  a  features  vector  is  potentially  very  powerful.  However  the  computation  required  is  too 
e.xhaustive  for  a  real  time  system.  One  of  the  best  ways  to  describe  a  statistical 
distribution  is  to  use  the  standard  deviation  cr,  the  coefficient  of  skewness  7  and  the 
kurtosis  P. 

The  feature  vectors  are  computed  for  each  modulation  type  according  to  three 
classes  of  signals:  strong  signal  (30dB),  medium  (15dB)  and  weak  (5dB).  The  parameters 
in  the  data  base  are  obtained  throueh  digital  signal  processing  of  data  generated  from 
12-bit  ADC  having  a  sampling  rate  of  20  kHz.  For  the  decision  process,  a  linear  classifier 
(Fisher’s  functions)  is  applied  to  these  reference  feature  vectors.  He  also  tried  a  quadratic 
classifier  but  found  no  improvement.  The  performance  obtained  depends  on  the  sample 
acquisition  time.  For  an  acquisition  time  of  409ms,  the  overall  performance  is  around  94r'c 
(5dB  <  SNR  <  30dB). 


'  See  the  digital  modulation  section. 
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3.5  HIPP  APPROACH 


3.5.1  ffipp 

This  area  of  modulation  classification  is  very  intuitive.  The  choice  of  the  features 
depends  upon  the  author’s  background,  imagination,  etc.  Hipp  [8]  used  a  different  way  to 
select  his  features.  He  did  a  systematic  and  exhaustive  evjduation  of  more  than  twenty 
features  from  which  he  retained  only  six.  The  results  obtained  are  very  impressive:  he 
classified  almost  all  modulation  types  (digital  and  analog)  with  an  overall  performance  of 
95%  with  SNRs  going  down  to  lOdJB.  His  paper  is  presented  in  a  distinct  section  for  these 
reasons. 

The  features  used  are  based  on  statistical  central  moments:  standard  deviation, 
standard  deviation  divided  by  its  mean,  skewness  and  kurtosis.  These  statistical 
parameters  are  used  to  describe  the  following  signal  characteristics:  amplitude,  phase, 
frequency,  spectrum,  squared  signal  spectrum,  in-phase  channel  and  quadrature  channel 


Amp 

Phase 

Cos 

Phase 

Sin 

Phase 

Freq 

Spec7 

trum^ 

Squared 
Spec¬ 
trum^  ^ 

Standard 

Deviation 

S 

X 

X 

X 

X 

S 

S 

Skewness 

S 

X 

X 

X 

X 

X 

X 

Kurtosis 

X 

X 

X 

X 

X 

X 

X 

Phase  Vector  Quality  Factor 

ycos^4)+sin^4) 

S 

RMS  Amplitude 

X 

Signal  Standard  Deviation  of 

Spectrum^^ 

S 

Notes; 

S:  Hipp's  selected  features 

X:  Not  selected  features 

i:  Threshold  at  3  dB  above  noise  floor 
ii:  Threshold  at  3  dB  below  spectral  peak 

Table  4:  I'eatures  evaluated  by  Hipp  [8,  p.20.2.5) 
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To  evaluate  all  of  these  features,  Hipp  collected  400  sets  of  1024  data  points  for  each 
modulation  type  with  SNRs  varying  from  lOOdB  to  lOdB.  Then  he  made  a  statistical  data 
analysis  on  the  data  base  in  order  to  select  those  parameters  having  greatest  discriminating 
properties.  He  achieved  that  evaluation  with  a  stepwise  discriminant  analysis  on  a  linear 
classifier  (Fisher’s  functions). 

The  resulting  six-feature  vectors  allowed  the  classification  of  an  unknown  signal 
with  an  overall  probability  of  95%.  The  features  retained  are:  the  amplitude  standard 
deviation,  amplitude  skewness,  phase  spread,  spectrum  standard  deviation  (threshold  at 
3dB  above  noise  floor),  spectrum  standard  deviation  (threshold  at  3dB  below  peak)  and 
squared  signal  spectrum  standard  deviation  (threshold  at  3dB  below  peak). 

The  simulation  was  done  with  a  sample  acquisition  time  of  26ms  at  a  sampling 
frequency  of  40kHz  for  an  IF  of  lOkHz.  The  carrier  frequency  was  randomly  selected 
within  IkHz  of  the  nominal  frequency.  The  IF  bandwidth  was  fixed  at  20kHz. 
Unfortunately,  some  modulation  Diameters  remained  constant  during  the  simulations: 
AM  modulation  index  fixed  at  90%,  FSK  frequency  deviation  500Hz,  baud  rate  300  and 
1200Baud,  and  FM  frequency  deviation  at  3  and  5kHz.  Moreover,  it  is  not  specified  how 
the  modulating  signal  (for  analog  modulation  schemes)  was  generated.  As  shown  by 
Ribble,  UTL,  Gallant,  and  Fry,  simulated  voice  and  real  voice  do  not  perform  in  the  same 
way. 
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4.0  MR  FOR  DIGITAL  MODULATIONS 


The  interest  in  digital  modulation  classification  is  growing  yet  the  number  of 
publications  is  stiU  small.  For  this  reason  all  the  classifiers  are  represented  by  only  two 
groups.  The  first  one  is  represented  by  Liedtke  and  Jondral.  These  two  authors,  especially 
the  latter,  are  very  well  known  and  are  dted  in  almost  every  paper  on  the  topic.  The 
second  group  includes  authors  who  used  a  different  approach  to  Liedtke’s. 


4.1  LIEDTKE  AND  JONDRAL  APPROACH 

4.1.1  Liedtke 

One  of  the  first  authors  to  publish  about  modulation  type  identification,  Liedtke  [2] 
was  also  the  first  to  present  the  concept  of  modulation  reco^tion  applied  to  digital 
modulation  schemes.  The  system  proposed  by  the  author  is  fully  digital,  as  shown  in 
Figure  22.  The  output  of  the  receiver  is  digitiz^  (In-phase  and  Quadrature  channels)  and 
then  filtered  by  a  bank  of  parallel  FIR  narrow— band  filters.  These  filters  have  the  same 
center  fiequency  but  different  bandwidths.  "The  best  classification  result  will  be 
automatically  obtained  behind  that  filter  which  matches  the  signal  bandwidth  best." 

The  feature  extraction  is  accomplished  with  a  "universal  demodulator".  These 
features  are  the  amplitude,  phase  and  instantaneous  frequency  (see  Figure  23).  To  get  the 
parameter  values  in  a  sjmehronous  way,  a  timing  recovery  procedure  is  used.  The 
parameters  are  defined  as: 


Amplitude  =  (P  +  Q2)*/2 


Instantaneous  Frequency  = _ 

2  T 


■  d(p(  t  ) 

■~dl 


t=ti 


1  <P  i  *  l  —  ‘P  i  ■  1 

2  TT  2  A  t 

(p  =  arctan  (Q/I) 


The  values  are  compiled  to  form  histograms  of  amplitude,  frequency  and  phase. 
These  histograms  are  used  as  features  for  the  classification.  The  256-point  phase 
histogram  of  BPSK  has  peaks  at  0  and  180*.  For  QPSK,  the  peaks  are  located  at  0,  90, 
180  and  270* .  And  likewise  for  8-PSK.  Therefore  the  phase  histogram  is  used  to  classify 
these  modulation  types.  The  object  is  to  use  the  histogram  as  input  to  a  procedure  that 
will  recognize  the  number  of  phf.ses.  A  pattern-recognition  algorithm  could  be  used, 
taking  each  cell  of  the  histogram  as  an  element  of  the  feature  vector.  However,  to  simplify 
the  computation,  Liedtke  used  suboptimal  weighting  functions,  as  shown  in  Figure  24 
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tMnotm  signal 


Figure  22:  Block  diagram  of  Liedtke’s  classifier  [2,  p.313]. 


Figure  23:  Liedtke’s  universal  demodulator  (2,  p.314]. 
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b) 


Figure  24:  Classification  of  PSK  signals  by  Liedtke 

a)  Actual  phase  histogram  for  QPSK  [2,  p.315] 

b)  Suboptimal  weighting  function  for  QPSK  [2,  p.316] 


The  unknown  signal  phase  histogram  is  compared  to  the  weighting  functions  and  a 
value  of  similarity  is  attributed  to  each.  The  largest  output  is  retained  and  noted  DPHI, 
corresponding  to  the  number  of  phases  detected  (2,  4  or  8). 

If  the  recognition  criterion  of  PSK  modulation  types  is  not  satisfied,  the 
classification  of  ASK  and  FSK  is  investigated.  To  identify  these  modulations,  the 
variances  of  the  amplitude  and  frequency  (AVAR  and  FVAR  respectively)  are  calculated. 
A  large  AVAR  indicates  ASK,  and  a  large  FVAR,  FSK. 

In  a  similar  way  to  the  phase  histogram,  the  amplitude  and  frequency  histograms 
are  also  used.  The  amplitude  histogram  of  ASK  contains  two  peaks,  as  does  the  frequency 
histogram  for  FSK.  These  histograms  should  contain  only  one  peak  for  other  modulation 
schemes.  The  resulting  variables  are  respectively  AHI  and  FHI. 
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The  decision  process  is  accomplished  with  three  boolean  equations;  one  for  the  PSK 
modulation  type,  one  for  ASK  and  one  for  FSK.  The  decision  function  for  PSK  is 


[  (max  (DPHI))j>2  >  TDPHI].[AVAR  <  TLAVAR].[FVAR  <  TFVAR]  =  TRUE 
« 

i=2,4  or  8  is  the  number  of  phases 
TDPHI  is  the  threshold  for  the  phase  histogram 
TLAVAR  is  the  threshold  for  the  amplitude  variance 
TFVAR  is  the  threshold  for  the  frequency  variance 
the  symbol  •  is  a  logical  AND 


If  the  function  is  "true"  for  i=2,  BPSK  is  detected;  if  i=4,  QPSK  is  detected;  and  if 
i=8,  8-PSK  is  present.  Otherwise,  i.e.  DPHI  is  optimd  for  i=l,  the  test  continues  for 
ASK  and  FSK.  The  equations  are  the  followings. 

[AHI  >  TAHI]*(AVAR  >  TUAVAR]  =  TRUE...for  ASK  and 
(FHI  >  TFHI]-[FVAB>  TFVAR]-[AVAR  <  TLAVAR]  =  TRUE...for  FSK 

TAHI  is  the  threshold  for  Afll 
TUAVAR  is  the  upper  threshold  of  amplitude  variance 
TFHI  is  the  threshold  for  FHI 


These  functions  and  the  thresholds  are  schematized  in  Figure  25.  The  Gve 
separation  parameters  are  shown  together.  The  dashed  lines  point  out  which  classes  are 
separated  by  which  separation  parameters.  The  overlapping  between  PSK  and  FSK 
indicates  some  remaining  difficulties. 


Figure  25:  Schematized  class  space  with  separation  parameters  [2,  p.318|. 
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The  resTilts  shown  in  Figure  26  indicate  a  classifier  very  sensitive  to  noise. 


Figure  26:  Liedtke’s  results  [2,  p.319]. 


4.1.2  Jondxal 

Jondral  [28][29]  used  an  approach  similar  to  Liedtke’s  to  which  he  added  two  analog 
modulation  types  (AM  and  SSB)  and  pattern— recognition  techniques  for  the  decision 
process. 

The  features  used  by  the  author  are  based  on  histograms.  However,  he  did  not  use 
Liedtke’s  synchronization  system.  Therefore  the  histograms  are  different,  although  the 
parameters  are  similar:  amplitude,  instancaneous  frequency  and  phase.  The  phase 
characteristic  called  zero  phase,  used  to  detect  BPSK,  is  obtained  by  squaring  the  signal  to 
create  a  carrier  at  twice  the  frequency  which  can  be  caught  by  a  tracking  loop,  resulting  in 
the  detection  of  0  and  tt  phases  in  BPSK  si§;nal. 

Collecting  the  parameters  for  4096  points,  a  histogram  of  192  cells  is  computed  (see 
Figure  27).  This  histogram  is  used  as  the  feature  vector.  However,  Jondral  resized  that 
the  features  between  positions  60  and  140  do  not  contribute  to  the  discrimination.  The 
final  results  were  obtained  with  93-D  feature  vectors.  This  is  remarkably  bigger  than 
Gadbois,  who  used  only  a  few  features. 
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NUMBER  OF  EVENTS 


amplitude;  instantaneous  frequency 


1000 


PSK  2 


1000 


80  Too  120  T 


ssB  suppressed 

CARRIER  (A3J) 


80  Too 

NOISE 


120 


140 


80  Too  120  ^  1 
- 128 - 


Figure  27:  Examples  of  histograms  [28,  p.l85). 


The  performance  obtained  is  very  good,  but  the  SNRs  are  not  specified.  The  system 
operated  for  a  signal  down-converted  at  200kHz.  The  12-bit  ADC  sampling  frequency  is 
chosen  according  to  the  IF  signal  bandwidth:  2.2kHz  to  22.2kHz  respectively  for  300Hz  to 
6kHz.  To  acquire  the  4095  data  points,  an  acquisition  time  of  0.18  to  1.8  seconds  was 
required.  The  decision  function  was  based  on  a  pattern-recognition  technique.  Jondrai 
tried  both  linear  and  quadratic  classifiers  (minimum  mean  squared  error):  he  respectively 
got  an  overall  accuracy  of  93%  and  98%.  To  reduce  the  number  of  terms  for  the  quadratic 
classifier,  he  applied  the  Karhunen-Loeve-transform  to  the  feature  vectors.  That  way,  30 
transformed-features  (containing  about  97%  of  the  information)  were  used  in  the  classifier 
instead  of  93  features. 


4.1.3  Dominguez 

Also  using  a  histogram  as  the  feature  vector,  Dominguez  [43]  [44]  realized  a 
recognition  system  very  similar  to  Jondral’s.  The  parameters  used  in  his  histogram  are  the 
amplitude,  the  instantaneous  frequency  and  the  instantaneous  phase  (p  (k). 

(p  (k)  =  arctan{Q  (k)/I  (k)} 
k  <  3000  points 

The  dimension  of  the  vectors  is  79:  31  components  corresponding  to  the  amplitude; 
31  components  for  the  frequency;  and  17  for  the  phase.  The  classifier  was  linear.  The 
overall  performance  is  95%,  and  the  system  recognizes  all  modulation  types.  However,  the 
SNR  is  not  given,  neither  is  the  message  signal  used  for  analog  schemes. 

Dominguez  also  presented  in  [43]  a  preprocessor.  Noticing  that  the  system  works 
better  if  the  signal  is  perfectly  centered  over  the  IF  frequency,  he  proposed  a  preprocessing 
able  to  estimate  the  frequency  of  a  carrier.  This  preprocessor  also  explores  the  spectrum  to 
detect  adjacent  modulated  signals:  it  is  similar  to  an  energy  detection  subsystem. 

To  analyze  the  spectrum,  a  periodogram  is  used.  First,  the  spectrum  is 
differentiated  to  detect  carriers.  Then  the  symmetry  around  the  carriers  is  studied.  The 
output  is  the  number  of  signals.  The  classification  algorithm  will  be  processed  if  there  is 
only  one  signal.  An  example  is  shown  in  Figure  28.  The  symmetry  around  the  first  carrier 
indicates  a  first  modulation  type  spread  on  both  sides  of  the  carrier  A  second  carrier  is 
present  in  the  spectrum,  indicating  a  second  modulation  type.  This  signal  would  have  to 
be  filtered  by  the  preprocessor  to  allow  the  first  signal  to  be  recognized. 


Figure  28:  Intercepted  spectnim[43] 
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4.1.4 


Adams 


Adams  [45]  proposed  an  improvement  to  Jondral’s  recognition  system  bv  using  a 
new  classification  process.  On  the  same  192  features,  he  applied  the  PGA  (Principal 
Component  Analysis)  algorithm  to  reduce  the  dimension  of  the  pattern  vector.  The  author 
did  not  give  the  size  of  the  resulting  vector.  Then  the  MANOVA  algorithm  is  applied  on 
these  new  vectors  to  optimize  the  discrimination.  This  is  a  linear  discrimination  technique. 
MANOVA  is  a  conventional  multivariate  statistical  technique  discussed  respectively 
in  [46]. 


No  performance  results  are  given  in  the  paper.  It  is  not  obvious  that  this  technique 
would  perform  better  than  Jondral’s.  Jondr^  also  used  data  reduction,  reducing  the 
dimension  of  his  vectors  from  192  to  30  with  the  Karhunen— Loeve-transform,  keeping  97% 
of  the  information.  Moreover  Jondral  used  a  multivariate  linear  and  quadratic 
discrimination  techniques. 


4.2  OTHERS 
4.2.1  Mammone 

The  paper  proposed  by  Mammone  [10]  presents  a  recognition  system  for  PSK 
signals.  Therefore  only  two  modulation  schemes  are  concerned.  However,  he  also 
presented  a  technique  for  evaluation  of  the  bit  rate.  The  phase  derivative  is  used  to  find 
the  transitions  which  occur  between  every  data  symbol. 

The  received  signal  is  digitized  and  expressed  as  z  (k).  The  phase  p  (k)  and  its 
derivative  are  expressed  by; 


p  (k)  =  arctan{Im[z  (k)]/Re[z  (k)]} 
P’  (k)  =  p  ^k)  -  p  (k-1) 


The  carrier  frequency  is  found  by  averaging  p’  (average  of  the  instantaneous 
frequency).  The  bit  rate  estimation  requires  further  processing.  The  intervals  between 
pulses  are  multiples  of  the  symbol  rate,  which  is,  knowing  the  modulation  type,  indicating 
the  bit  rate.  The  amplitude  of  the  pulses  indicates  the  phase  shift.  7r/2  or  tt.  Therefore  it 
is  a  feature  to  discriminate  BPSK  from  QPSK.  However,  because  the  signals  are  filtered 
and  phase  noise  is  present,  the  instantaneous  phase  shifts  are  not  very  accurately 
representing  the  true  phase  shifts.  The  author  presented  another  approach  able  to  estimate 
more  accurately  the  amplitude  of  these  phase  shifts. 

The  decision  process  is  accomplished  with  a  logic  tree.  The  performance  is  given  for 
a  system  operati  g  with  1024  sampfing  points,  3kHz  and  30kHz  bandwidth  depending  on 
the  baud  rate  (from  75  to  19200  bps),  and  C/No  between  35  and  70dB.  The  performance  is 
given  according  to  the  baud  rate.  It  is  shown  that  a  low  baud  rate  signal  (below  200bps) 
would  require  a  much  longer  acquisition  time  (see  Table  5). 
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MOD. 

TYPE 

C/No 

DATA 

RATE 

CLASSIFIED 

AS 

BPSK 

QPSK 

CW 

dB 

bps 

% 

% 

% 

75 

82 

15 

3 

BPSK 

45 

300 

98 

2 

0 

1200 

95 

5 

0 

2400 

1 

99 

0 

QPSK 

45 

4800 

16 

83 

1 

19200 

89 

11 

0 

75 

82 

14 

4 

BPSK 

60 

300 

100 

0 

0 

1200 

96 

4 

0 

2400 

0 

100 

0 

QPSK 

60 

4800 

2 

98 

0 

19200 

0 

100 

0 

Table  5:  Mammone’s  results  [10,  p.28.4.6). 


4.2.2  DeSimio 

For  his  Master’s  Degree,  DeSimio  [4][5j  made  some  simulations  concerning  .ASK, 
FSK,  BPSK  and  QPSK  automatic  recognition.  Nine  features  are  used.  The  mean  and 
vanance  of  the  signal  envelope  are  used  to  discriminate  ASK  signals.  The  other  features 
are  taken  from  spectra. 

The  following  four  features  are  the  spectral  location  and  magnitude  of  the  two 
largest  correlation  peaks  of  the  signal  spectrum  with  a  sinc^  reference  function  'I'wo  large 
peaks  indicate  FSK.  The  results  of  correlations  with  spectra  of  the  signal  squared  and 
quadrupled  provide  information  related  to  the  number  of  phases:  BPSK  vs  QPSK 

Unfortunately  the  simulation  was  done  with  Gxed  baud  rate  signals  (2500  bauds), 
therefore  the  features  presented  could  not  be  applied  to  the  general  case  of  unknown  bit 
rate. 

The  decision  process  is  done  with  an  adaptive  technique.  The  LMS  algorithm, 
which  IS  derived  from  the  perceptron,  optimizes  the  values  of  weight  vectors  during  a  long 
learning  proce.ss.  Once  the  learning  process  is  Gnished,  the  system  is  ready  for  testing  with 
unknown  signals.  DeSimio  tested  the  clas.siGer  with  only  1(5  samples  (SNRs  from  20  to 
TidB)  for  which  he  obtained  no  mischussiGcation. 
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The  perception  classifier  used  by  DeSimio  could  be  seen,  by  extension,  as  a  very 
simple  neural  network  with  no  hidden  layer,  nonrecurrent  and  a  hard  limiting  transfer 
function.  The  processing  element  of  the  classifier  is  shown  in  Figure  29. 


HARD  LIMITER  THRESHOLD  LOGIC 
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Figure  29:  A  single  unit  of  the  perception  classifier  [47,  p.5 


As  in  the  case  of  the  linear  classifier,  the  output  is  computed  by  vector 
multiplications.  There  are  as  many  outputs  in  the  network  as  classes,  as  we  had  with  the 
linear  classifier.  However  the  weighting  vectors  W  are  not  determined  by  statistical 
methods.  Via  a  long  learning  process  OOO  000  iterations)  the  five  weight  vectors  (one  per 
modulation  type)  are  non-parametrically  determined.  In  Figure  29,  the  transfer  lunction 
for  the  perception  and  LMS  is  represented  respectively  by  Threshold  Logic  and  by  Hard 
Lirfixtci'^ 


5.0  MR  BASED  ON  ENERGY  DETECTION  ALGORITHMS 

As  described  so  £ax,  modulation  recognition  has  been  accomplished  usine 
conventional  pattern  recognition  techniques.  In  this  chapter,  a  different  approach  is 
presented,  based  on  the  premise  that  if  we  are  able  to  detect  a  modulation  type  and  only 
one,  we  recognize  it.  This  chapter  presents  modulation  recognition  based  on  energy 
detection  algorithms.  ^ 


5.1  READY  APPROACH 

^  V-^-  registered  for  a  "Modulation  Detector  and 

Classmer  [48J.  Ready  presented  a  new  approach  for  some  sort  of  digital  modulation 
classification.  It  is  quite  a  complex  hardware  system,  so  the  complete  details  will  not  be 
explained  here. 


Figure  30:  One  of  the  3  similar  stages  of  Ready’s  system  (48,  p  2). 
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Figure  31;  Example  of  the  processing  performed  by  the  system 

a)  Received  spectrum  bj  Noise  path  c)  Signal  path  (48,  p.9 
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The  system  is  basically  made  by  cascading  N  similar  stages.  Stage  I  detects 
Constant-Phase  modulation  s^emes  (M-PSK);  stage  2  for  Linear-Phase  (M-FSK);  and 
stage  3  for  Quadratic— Phase  signal  (linear  FM  or  chirp).  Higher  order  phase  modulations 
could  be  implemented.  The  system  performs  classification  by  characterizing  the  phase. 
This  hardware  could  not  be  modified  for  analog  modulation  types. 

In  Figure  30,  the  signal  22  comes  from  the  proceeding  stage,  and  signal  22L  goes  to 
the  following  stage.  Signals  31L  and  26L  are  used  by  the  decision  circuits.  The  system 
performs  signal  detection  and  classification.  Therefore,  it  allows  more  than  one  modulation 
type  in  the  bandwidth  of  interest.  The  system  would  also  work  in  the  presence  of 
interfering  signals. 

Figure  31— a  shows  an  example  of  a  received  spectrum  where  several  modulation 
types  are  present.  The  detection  of  a  constant-phase  signal,  in  the  example  PSK,  would  be 
performed  by  stage  1.  The  output  of  the  first  stage  noise  path  envelope  detector  is  an 
estimation  of  the  noise  amplitude.  This  threshold  vdue  is  used  to  detect  the  presence  of  a 
signal  at  the  output  of  the  first  stage  signal  path  envelope  detector.  Figure  31-c,  presents 
that  output.  We  can  see  the  same  spectrum  as  31— b  but  with  two  additional  narrow 
spectral  lines  above  the  noise  threshold.  These  spectral  lines  are  symmetric  about  the 
carrier  frequency  and  indicate  the  presence  of  a  constant-phase  signal  and  its  symbol  rate. 
The  last  path,  i.e.  the  phase  extractor,  gives  information  on  the  phase  difference  between 
the  clock  circuit  and  the  data  clock  of  the  received  signal. 


5.2  KIM  APPROACH 

Kim  [49] [50]  presented  his  energy  detector  as  a  way  to  classify  phase-modulated 
signals.  Using  only  one  feature,  i.e.  qLLR  defined  below,  he  was  able  to  discriminate 
BPSK  from  QPSK  for  very  low  SNR  (OdB).  The  algorithm  is  as  follows: 

qLLR  =  (S,  - 

where  all  S  are  energy  related  and  defined  by 

N  N  N 

S  =  y  r^  ;  S„  =  y  r^  ;  S,  =  y  rj 
1  L  I,n  q  U  Q,n  ’  >Q  L  l,n  Q,» 


nT 


s 

r(t)cosw^t  dt;  n=l 


(n-l)T 
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,N 


nT 


S 

r(t)sina;^t  dt;  n=l,...,N 


(n-  1)T 
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and  r  (t)  is  the  received  signal  containing  AWGN. 
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The  feature  proposed  seems  to  perform  well  with  AWGN.  The  results  are  given  for 
SNRs  as  low  as  OdB  with  dose  to  100%  conect  dassification.  However,  Kim  assumed  a 
predse  knowledge  of  the  carrier  frequency  and  symbol  timing.  In  a  real  system,  accurate 
determinations  of  these  might  be  very  difficult  for  signals  as  noisy  as  0  dB,  and  might  lead 
to  unfortunate  performance  degradations.  Moreover,  it  is  not  dear  that  the  system  will  not 
react  to  other  modulat'on  schemes  that  might  be  presented  to  the  Modulation  Recognition 
Sub-System  in  a  real  system.  This  technique  is  applicable  only  to  digital  phase 
modulation  schemes. 


5.3  GARDNER  APPROACH 

Gardner  is  wdl  known  in  the  ener^  detection  area.  In  a  paper  [51]  he  proposed  the 
use  of  his  techniques  for  modulation  dassification  purposes.  The  interception  of  LPI  signal 
is  accomplished  by  exploiting  cyclic  features  wMch  are  present  in  modulated  signals: 
sinewave  carrier,  data  rate,  etc..  The  theory  of  spectral  corrdation  in  cyclostationary 
signals  developed  by  Gardner  has  been  shown  to  be  a  general  and  flexible  cyclic  feature 
estimator  called  a  cyclic  spectrum  analyser. 

The  process  maps  a  time  varying  signal  (2— D)  into  a  3— D  distribution,  using  cyclic 
spectra.  The  distribution  presents  some  characteristic  shapes  potentially  useful  for 
modulation  classification. 

The  time  varying  signal  x(t)  is  cyclostationary  if  the  parameter 

i2?(r)  =  1  im  \%+r/2)x‘{t-T/2)e~''‘^^°‘^dt 
T-oo  T  J-T/2 

is  not  zero  for  some  afO.  R%t)  is  called  the  cyclic  autocorrelation,  and  its  Fourier 

transform  is  called  the  cyclic  spectrum,  5?(/),  and  can  be  interpreted  as  a  spectral 
correlation  function  via  the  following  characterization. 

SV())  =  1  im  lim  ^  Xr(i,/+a/2)X/(t,^a/2)dt 

T-too  At-<aD  TA  t  •'  -  At/2 

J  t-T/2 


The  set  {a:  i2?(r)  ^  0}  is  called  the  set  of  cycle  frequencies.  Some  examples  are 
presented  in  Figure  32. 
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Figure  32:  Theorical  spectral  correlation  distributions  [52,  p.902] 
a)  BPSK  b)  QPSK  c)  OQPSK  d)  MSK 


These  distributions  show  some  discriminating  power  among  modulation  types. 
However,  Gardner  did  not  discuss  how  these  distributions  are  sensitive  to  parameters  such 
as  baud  rate,  modulation  index,  carrier  frequency,  etc..  Neither  did  he  discuss  the  ability 
of  his  cyclic  feature  to  characterize  analogue  modulations  regarding  the  random  nature  of 
voice  (a  nonstationary  process).  The  potential  advantage  of  this  technique  is  its  ability  to 
recognize  a  modulated  signal  in  the  presence  of  interference. 
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6.0  CONCLUSION 


In  this  document,  we  have  seen  how  different  authors  have  performed  modulation 
recognition.  Using  pattern  recognition  techniques,  they  have  implemented  a  classification 
algorithm  able  to  associate  features,  which  have  been  judged  discriminative,  to  different 
classes  according  to  the  similarity  between  the  unknown  pattern  vector  and  the  reference 
pattern  vectors  obtained  during  the  training. 

For  the  classification  process,  it  seems  that  the  linear  and  the  binary  tree  classifier 
have  been  adopted  almost  unanimously.  The  linear  classifier  is  the  simplest  of  parametric 
techniques.  Using  only  the  pooled  within-class  covariance  matrix  to  classify  the  unknown 
signal,  tfos  technique  is  very  fast  and  easily  implemented.  For  a  real-time  requirement 
such  as  MR,  it  is  a  good  choice.  However,  for  solving  bigger  problems,  with  numerous 
classes  and  features,  it  is  usually  advantageous  to  use  more  sophisticated  algorithms.  The 
quadratic  classifier,  by  using  parabolic  mnctions  instead  of  linear  ones  to  discriminate 
dusters,  could,  depending  upon  the  particular  application,  present  improved  performance 
The  outcome  is  an  additional  amount  of  memory  and  computation  time  for  manipulating 
the  within-class  covariance  matrices  (one  per  dass). 

Although  these  parametric  techniques  could  be  optimal  in  some  applications 
(typically  when  the  distributions  are  normal),  the  more  general  case  of  non-parametric 
techniques  sometimes  needs  to  be  investigated  (for  example,  when  distributions  are 
multi-modal).  In  this  technical  note,  only  the  binary  tree  concept  has  been  discussed. 
Modem  algorithms  for  creating  trees  are  very  powerful  for  handling  a  large  amount  of 
classes  and  features.  The  CART  algorithm  is  probably  the  most  standard  type. 

Notice  that  there  also  exists  other  noaparametric  algorithms  in  classical 
pattern-recognition  literature  [15].  Kernel  (Parzen  estimator)  and  K -nearest-neighbor 
approaches  are  very  well  known.  In  the  first  case,  the  density  function,  instead  of 
assumedly  bdng  known  as  in  the  parametric  approach,  is  approximated  by  a  sum  of 
kernels.  These  can  be  any  functions,  although  normal  kernels  are  usually  used.  For 
dassification,  the  unknown  signal  is  compared  to  all  N  training  samples  according  to  the 
the  kernel  function  in  use  and  is  associated  to  the  dosest  class  considering  the  overall 
probability.  For  the  K-nearest-neighbor,  the  dedsion  rule  is  simplified  by  considering 
only  the  K  samples  closest  to  the  unknown  (instead  of  all  N  samples)  to  compute  the 
probabilities  p(w,|i).  Unfortunately  it  is  still  necessary  to  store  all  N  samples  and  compare 
the  unknown  with  all  these  samples  to  find  the  K  closest  points.  Therefore,  the  amount  of 
computation  is  not  significantly  decreased.  In  order  to  overcome  this  disadvantage,  it 
would  be  interesting  to  diminate  samples,  keeping  only  some  good  representatives  (located 
along  boimdaries).  This  process  is  c^ed  condensed  K— nearest— neighbor  decision  rule  in 
115], 


Another  alternative,  not  proposed  by  the  authors  mentioned  in  this  paper,  but  very 
popular  in  modern  pattern  recognition  research  centers,  could  also  be  applied  to  the  MR 
problem.  Neural  networks  are  very  interesting  nonparametric  techniques  which  provide 
efficiency  and  extreme  versatility  with  numerous  applications.  The  great  interest  for  this 
artificial  brain  is  fairly  new  but  some  algorithms  are  becoming  wdl  established,  such  as  the 
back— propagation  network.  It  is  highly  likely  that  some  new  MR  systems  will  use  neural 
networks,  in  the  future,  when  neural  network  VLSIs  will  be  available  on  the  market. 
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Although  the  choice  of  the  classifier  is  ve^  important,  the  selection  of  good  features 
is  fundamental.  The  features  have  to  be  discriminative,  noise  resistant  and  requiring  as 
few  computation  as  possible.  A  feature  is  obtained  from  the  probability  distribution 
(histogram)  of  a  parameter,  which  is  usually  computed  on  a  point  to  point  basis  from  the 
sampled  signal  waveform.  Popular  parameters  are  the  signal  envelope,  the  instantaneous 
frequency,  etc.  Some  authors  also  used  the  frequency  distribution  (PSD)  to  represent  the 
parameter.  In  both  cases,  the  features  are  chosen  in  order  to  represent  efficiently 
characteristics  from  the  distribution.  Very  often  the  mean  and  standard  deviation  are 
used,  although  the  skewness  and  the  kurtosis  are  also  popular  (Einicke  and  Hipp).  Another 
approach  consists  of  using  directly  the  bin  vadues  of  the  distribution  as  features  (Jondral). 
Uiifortunately  the  latter  gives  rise  to  a  very  large  dimension  feature  vector.  To  avoid  that 
problem,  only  the  most  sigmficant  bins  can  be  used:  Miller  used  the  first  bin,  p(0),  from 
the  sign^  envelope  distribution.  See  Table  6  for  more  examples. 


Section 

Author 

Parameter 

Time/Freq 

Feature 

3.1 

Miller 

Signal  Envelope  A 

T 

Bin  value  p(0) 

3.2 

Gadbois 

A^ 

T 

a\A^) 

Gallant 

a2 

T 

o{o\A^)) 

3.3 

Fabrizi 

Instant.  Freq.  F, 

T 

P(F,) 

3.4 

Aisbetl 

A-Fi 

T 

AA’ 

T 

II 

3.5 

Hipp 

PSD(signal  R) 

F 

a{PSD{R)) 

PSD(signaF) 

F 

a{PSD(R^)) 

4.1 

Jondral 

A  F,  and  F,' 

T 

Bin  values 

Table  6:  Examples  of  features  used  for  modulation  recognition 


It  would  be  interesting  at  this  time  to  investigate  a  stepwise  selection  of  these 
features  to  get  a  discriminative  pattern  performing  with  noisy  signals  and  short  acquisition 
times.  Some  new  features  would  also  be  very  desirable,  features  able  to  deal  with  lower 
SNRs,  shorter  acquisition  times,  and  more  modulation  types.  From  a  military  perspective, 
classification  of  signals  with  SNRs  as  low  as  5  to  10  dB  presents  a  real  interest.  Gallant’s 
solution  of  very  long  acquisition  time  (a  little  below  2  seconds!  to  remove  gaps  in  the  voice 
is  not  adequate  for  all  applications.  A  new  feature  able  to  discriminate  analogue 
modulation  types  with  a  shorter  acquisition  time  would  be  very  desirable.  Finally  it  would 
be  worthwhile  to  be  able  to  recognize  other  power-efficient  modulation  types  such  as  MSK, 
OQSPK,  OMSK  and  TFM. 
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